





Journal of Applied Psychology 


The Journal of Applied Psychology is published bi-monthly by ¢\. rican 
Psychological Association, Inc, The annual subscription rate is ¢ 00, single 
copies, $1.25. Subscriptions, orders, and business communication: should be 
addressed to the American Psychological Association, Inc., 1515 }\... 
Avenue, Northwest, Washington 5, D. C. 


The character of the articles desired and the policy of this Journal w ith ro: spect 
to subject matter is as follows: Reports of original investigations in . ¥ field of 
applied Psychology (except clinical and consulting psychology) are pref An 
occasional descriptive or theoretical article, however, will be accepted 1 i: , deals 
with some phase of applied psychology in a distinctive manner. The policy, hoy. 
ever, is to favor papers dealing with quantitative investigations that \0u\d be of 
direct value to applied psychologists working in the following fields: \ ‘ocational 
diagnosis and occupational guidance, educational diagnosis, prediction and guidance 
at the secondary school level and higher, personnel selection, training. placement 
transfer and promotion in business, industry and government service inc): ing the 
armed forces, training supervisors in business, industry and Government service, 
problems of illumination, ventilation, fatigue, etc. in business and industry, job 
analysis, description, ard evaluation, employee morale measurement, applied social 
and political psychology (such as opinion surveys conducted by The }s). chological 
Corporation), market research and advertising. Articles under 500 words may be 
accepted. Maximum. limit would be 16,000 words. The average would probably 
be in the neighborhood of 4,000 words. in order to reduce lag of publication, 
sience 1AM HS OF herity. einelavent with clnttiy” is encouraged. 


_ With rare exceptions, articles are published in the order of thei receipt. 


The policy 0 f “early publication,” ‘the ‘author paying ‘complete costs, is followed. 
The scheduled 8 ‘pages per issue is thereby increased by the corresponding amount; 
thus the “early publication” of an article is a direct contribution to the subscribers 
re without handicap to those authors whose articles are accepted in their regular tum 
decays  -ncieepep memes whose oe Pephac: 


sachusetts 


reve , that lag will ate vary 
here - n flow throughout the year. 





Journal of Applied Psychology 








Vou. 31, No. 3 June, 1947 








Some Milestones in Public Opinion Research * 


Henry C. Link 
The Psychological Corporation, New York, N. Y. 


Some milestones in public opinion research, such as the Gallup and 
Fortune polls and the debacle of the Literary Digest poll, are so obvious 
that merely to mention them will bring general recognition. Others, 
for one reason or another, would not be generally recognized unless they 
were pointed out or described. Among the latter, the complete history of 
this field will surely include the pioneer commercial research during the 
decade of 1920 to 1930. This was the decade during which many of 
the basic techniques of public opinion research were created by such 
practical, and at the same time highly imaginative, research workers as 
Charles C. Parlin, R. O. Eastman, D. P. Smelser, Percival White, Pauline 
Arnold, Joseph W. Hayes, Daniel Starch, Archibald Crossley, Frank 
Surface, to mention but a few.' During this decade J. David Hauser 
developed the large-scale application of attitude surveys among em- 
ployees and consumers for department stores, public utilities and a great 
variety of business organizations. 

The fact that much of this early research was devoted to the study 
of people’s opinions about magazines, about food products, about soaps 
and many other things of everyday life should not obscure its great im- 
portance to the field of public opinion research. After all, opinions are 
opinions. It may be that opinions on political and social issues are 
more important than opinions about automobiles, cigarettes, coffees, 
advertisements or telephone services. Nevertheless, the techniques of 
obtaining opinions reliably were furthered by the comparatively non- 
partisan nature of the commercial field. 

* This article was written for and published in the first issue of the International 

Journal of Opinion and Attitude Research, a journal inaugurated in March, 1947, under 
the editorship of Professor Laszlo Radvanyi of the National School of Economics in the 
National University of Mexico. The article is published here with Professor Radvanyi’s 
permission and is a “prior publication,” with complete costs paid. 
_ ‘Other important names in the field of commercial research include those of Paul C. 
Cherington, Walter Mann, A. W. Lehman, Ray Robinson, Eldrich Hayes, W. J. Reilly, 
E. K. Strong, Jr., Gerald Tasker, Victor Pelz, Louis Weld, Robert King, Henry Weaver, 
Everett Smith, John Drescher. This by no means exhausts the list. 


225 





226 Henry C. Link 


The outstanding developments of this early research were: (1) TI 
use of the formal, standardized questionnaire; (2) The reliance on per- 
sonal face to face interviews; (3) The use of large samples of people prop- 
erly distributed. Heretofore, heavy reliance had been placed on mailed 
questionnaires, and on occasional personal interviews made informal], 
Most of the opinion research by professional psychologists during this 
period, such as that by A. T. Poffenberger and H. L. Hollingworth, haq 
been done with students and in the classroom. Dr. Daniel Starch. 
whose early research at Harvard was also of this nature, was probably the 
first professional psychologist to open his own office and devote himself 
entirely to market research (New York, 1926). Here he pioneered jn 
opinion research on advertising and developed his nation-wide survey ys 
of magazine readership. Notable during this decade were the early 
experiments of L. L. Thurstone in the measurement of attitudes through 
the use of multiple item attitude scales (22). 

Intensive opinion research by psychologists did not fully get unde: 
way until the decade 1930 to 1940. In the recent review of the literature 
of opinion-attitude methodology by McNemar (19) only five of the 133 
titles of research mentioned were dated between 1920 and 1929; forty 
were dated between 1930 and 1939, and ninety-eight between 1940 and 
1946. 


i¢ 


The Psychological Barometer 

In March 1932, the writer, with the aid of fifteen psychologists 
associated with The Psychological Corporation, made a survey in fifteen 
cities and towns throughout the country. Personal interviews were made 
in 1578 homes. The report of this survey in The Harvard Business Review 
of January, 1933 (13) gives complete details as to methods, results, and 
the statistics by each of the fifteen subsamples. Since then, there have 
been 89 Psychological Barometer surveys totaling in excess of 570,000 
personal interviews. Probably the earliest continuous use of a public 
opinion question in a series of nation-wide surveys made entirely with 


personal interviews is recorded in the following results from six Psychol- 


gical Barometer surveys (14). 


h 
y 


Question: From what you have seen of the National Recovery Act in your neigh- 


borhood, do you believe it is working well? 








Oct. Nov. Jan. April _—_ Sept. Jan. 
Answers 1933 1933 1934 1934 1934 1935 





of o or a o7 

70 /0 /0 0 ( 
Yes 48 41 55 50 38 
No 27 30 22 23 26 


Uncertain 25 29 23 27 36 


Total Interviews 





1934, 
and 
made 


const 


mina 
usua 
dred 
inter 
inter 
hun 
at tl 


were 





Some Milestones in Public Opinion Research 


At a time when Gallup was still experimenting in his Institute of 
Public Opinion with a combination of personal and mail interviews, the 
Psychological Barometer was being conducted according to these assump- 


tions: 


|, A proper sampling of the population could be assured only through 
personal interviews made, for the most part, in the home. 
9 The assignment of interviewers to specific areas under the direc- 
tion of a local psychologist or supervisor was more reliable than the 
system by which each interviewer was allowed to select his own “‘quota.”’ 
3A large number of naive interviewers doing about 25 interviews 
each was generally more desirable than a small number of sophisticated 
interviewers doing two or three hundred interviews each. The former 
were less likely to become stereotyped or biased in their work. The 
spread of the sample was also facilitated. 


The Psychological Barometer, which has been self-supporting since 
1934, is now conducted four times a year with 10,000 personal interviews 
and twice a year with 5,000 personal interviews. The interviews are 
made in about 125 cities and towns throughout the United States and 
constitute a good cross-section of the urban population. 


How Many Interviews Are Necessary for Results of a Certain Accuracy? 


In the early days of market and public opinion research, the deter- 
mination of how many interviews were necessary for reliable data was 
usually decided empirically. For example, the results of the first hun- 
dred interviews were compared with the results of a second hundred 
interviews, and if there was a substantial difference another hundred 
interviews were made and compared with the results of the first two 
hundred. This process was continued until, in the terms of research men 
at that time, the results ‘‘became stabilized.”” That is to say, the results 
were considered stable when the addition of more interviews did not pro- 
duce large differences. Naturally, the point of stabilization was a matter 
of arbitrary judgement. 

Some statisticians familiar with this field pointed out that the estab- 
lished laws of chance as formulated by Bernoulli, Mills, Pearson, Fisher 
and others were applicable to interview data. Professor T. H. Brown of 
the Harvard Business School published a table in 1935 (4) showing the 
probable error with samples of various size as based on the accepted for- 
mula for standard deviation. However, probably the first empirical 
demonstration that this formula applied to the polling of consumer re- 
actions was made in June-July 1934, with a Psychological Barometer 
survey of 5165 interviews. It was made under the direction of Dr. 





228 Henry C. Link 


Irving Lorge of Teachers College and Philip G. Corby of The Psycholog. 
ical Corporation staff. , ° 

The interview data were punched on 5165 Hollerith cards. Thos, 
cards were then separated into fifty-one samples of one hundred each 
When the answers to a variety of questions were computed by these 
fifty-one small samples, it was found that no matter what the question 
or what the size of the results they corresponded with the Bernoulli dis. 
tribution. In short, the theoretical prediction of error according to the 
binomial distribution and associated probabilities was demonstrated 
empirically. The application of these findings to the decisions on how 
many interviews are necessary in one-time and in periodic studies, under 
varying practical requirements, is described in the report on this stud, 
(15). 

In recent years the arbitrary method of determining the number oj 
interviews by the point at which results stabilize has been generally aban- 
doned in favor of the less cumbersome and lessexpensive method of mathe- 
matical prediction. For the better understanding of the layman, some 
pollsters have popularized the claim that survey results are accurate 
within three or four per cent. This is a popular fiction quite contrary to 
the statistics of sampling. The margin of permissible error in a sample 
of one hundred interviews for a value of fifty per cent is from zero to 
fifteen per cent, whereas in a sample of 10,000 it is from zero to one and 
one-half per cent. 

The Literary Digest Poll 


In the Presidential election of November, 1936, the Literary Digest 
Magazine poll demonstrated conclusively that, no matter how large the 
sample or the number of interviews, short of a complete census, it can 
produce a highly inaccurate result. The Literary Digest had mailed 
ten million post card ballots asking people to state the candidate for whom 
they would vote. Of these, 2,376,523 were returned. From this large 
sample the Literary Digest predicted that Franklin D. Roosevelt would 
get 43 per cent of the major party vote. Actually, he received 60.7 per 
cent of the total vote. In previous elections this same method used by 
the Literary Digest had proved fairly accurate. However, in the 1936 
election for the first time the traditional split between Republicans and 
Democrats was affected by a major split along economic or class lines. 
The lower income groups voted overwhelmingly for Mr. Roosevelt. An 
analysis of the Literaty Digest method revealed that its ballots had been 
mailed to and returned by a disproportionately large number of the 
upper income groups and the better educated. The lower income groups 
who gave Mr. Roosevelt such a large majority were not adequately 
sampled by the Literary Digest method. 


polls 


douh 


pred: 
issue 
resul 
sam] 
woul 
scien 
exan 
Cola 
lour 
eC yn 
and 

palgi 


since 





Some Milestones in Public Opinion Research 


The Fortune, Gallup and Crossley Polls 


In the same elections, November, 1936, the Fortune Magazine poll 
predicted that Roosevelt would get 61.7 per cent of the total vote. 
Actually he received 60.7 per cent. The Fortune poll was based on about 
3500 personal interviews. The poll conducted by Archibald Crossley 
predicted a 53.8 per cent vote for Roosevelt. The Gallup poll predicted 
vote of 53.8 per cent for Roosevelt. 

The first Fortune Magazine poll appeared in July, 1935. It was the 
brain child of the late Paul T. Cherington, one of the pioneers in market 
research. While following at the outset the pattern set by the Psycholo- 
vical Barometer in asking questions about consumer buying habits as 
as well as questions on social issues, its sample included not only the 
yrban population but the farm population as well. After a short period, 
Elmo Roper assumed complete direction of this poll, which is now a 
regular feature of Fortune Magazine and is devoted entirely to social 
and political issues. 

The American Institute of Public Opinion, popularly known as the 
Gallup Poll, made its first appearance as a syndicated feature in a group 
of newspapers in October, 1935. Its first release dealt with a question 
on government spending. At this time Dr. Gallup was still using a 
combination of personal interviews and mailed ballots. 

Fortunately for public opinion research on political issues, it soon 
became necessary for these polls to cope with the problem of predicting 
election results. If the opinions which people expressed about their 
voting intentions could not be neasured accurately, and if their intentions 
were not reflected in their actual voting behavior, then the validity of 
polls in regard to other political questions would also have been in serious 
doubt. The Literary Digest debacle in 1936, and the more accurate 
predictions of the Fortune, Crossley and Gallup polls, made a dramatic 
issue of the question of validity. The 1936 experience showed that valid 
results could be obtained if the poll were made with a properly distributed 
sample, such as could be best obtained by personal interviewers who 
would complete interviews with the people designated. 

Less dramatic but more important demonstrations of validity, from a 
scientific point of view, were those made in the commercial field. For 
example, more people in the United States vote for (that is, buy) Coca- 
Cola or Campbell’s soup every few days than vote for a President once in 
four years. The economic system of the United States as in other free 
economies is based on the practice of democracy in the market place, 
and the institution of advertising represents a continuous political cam- 
paign by the makers of specific products and brands (3). Therefore, 
since the early days of research the commercial research men surveying 





230 Henry C. Link 


the market on various products had been required to validate thei, 
results with the known facts about the sale of those products. Two, 
the pioneer scientific studies in this field were those made by the 
chologist, J. G. Jenkins, in 1937, on the ‘Dependability of the Psychol. 
gical Barometers” (9, 10). A further discussion summing up various 
phases of this subject is that by Link and Freiberg on “The Problem of 
Validity vs. Reliability in Public Opinion Polls’ (16). 


Dsy- 


Milestones in the Literature 


The Public Opinion Quarterly, inaugurated in 1937 as “an independent, 
nonprofit journal subsidized solely by the School of Public Affairs oj 
Princeton University,” is certainly a milestone in the literature of opip- 
ion research. Many other journals had for years before the advent 
the Quarterly published papers on opinion research, and continue to do 
so. Prominent among them are the Journal of Applied Psycholog, 
Journal of Social Psychology, American Sociological Review, America 
Political Science Review, Journal of Abnormal and Social Psychology, and 
others. The Public Opinion Quarterly, however, with an editorial board 
including the leaders in public opinion research has devoted itself exclu 
sively to this field. About equal emphasis is given to experimental re 
search and to theory and interpretations. An important feature of this 
journal is its quarterly compilation of poll results released by the Amer- 
ican Institute of Public Opinion, the British, Canadian, Danish and 
French Institutes, the Fortune poll, the National Opinion Research 
Center and others. Today the ten volumes of the Quarterly represent 
the most comprehensive record in the field of public opinion research 

A conspicuous milestone in the development of public opinion re- 
search is the interpretation of such research in terms of political theory 
and democratic practice. The book, The Pulse of Democracy (8), pub- 
lished in 1940 by George Gallup, a psychologist, and Saul Forbes Rae, 
a sociologist, unquestionably represents such a milestone. This book de- 
scribes some of the methods of opinion research but is notable chiefly 
for its discussion of such research as an instrument in the preservation and 
perfection of democracy. It is also a milestone in the field of social 
psychology because, whereas previous books on social psychology had 
placed heavy emphasis on speculative and philosophical theories, 7) 
Pulse of Democracy based all its theroizing upon quantitative measures 
and objective research. 


Another milestone in the literature of public opinion research is the 
textbook, Consumer and Opinion Research by the psychologist, A. B. 
Blankenship (2). This is the most comprehensive textbook in this re- 
latively new fleld. It is an excellent description and evalution of tech- 


comn 
comp 
sever. 


From 


cedin 


rema 
quest 


ences 


enou 





Some Milestones in Public Opinion Research 231 


niques in many areas. The most recent book in this field is How to Con- 
Juct Consumer and Opinion Research, edited by Albert B. Blankenship 
3). This includes among its contributors the psychologists Freiberg, 
tazarsfeld, Katz, Wulfeck, Welch, all of them leaders in this field, as 
well as such eminent business research men as Hooper, Barton, DuBois 
and Haring. 

The literature is now too voluminous to permit extensive mention here. 
However, two books which really should be noted are Cantril’s Gauging 
Public Opinion (5)and Lazarsfeld’s The People’s Choice (11). The former 
isa discussion of various developments and problems in opinion research. 
The latter is based on original research in the changes occurring in people’s 
voting intentions during a political campaign. 


The Experimental Development of the Questionnaire 


An outstanding development concentrated largely in the 1930 to 1940 
decade was the experimental development of the questionnaire. The 


belief that a questionnaire could be written by an individual or by a 
committee of experts and then put into the field for a survey was almost 
completely discarded. Instead, a trial questionnaire was first tested by 
several trained interviewers making from ten to twenty interviews each. 
From this experience the questionnaire was revised and tested again. 
Sometimes as many as ten or fifteen separate tests of this sort were made 
with samples of various sizes and various kinds, before a satisfactory 
juestionnaire was arrived at. This process, described by the author in 
1932 as the “test tube” procedure (17), may take from a few weeks to 
several months, whereas a complete survey of 10,000 or 20,000 inter- 
views can be made with complete report in two months or less. The 
term “pilot study”? has become popular in recent years to designate one 
phase of the test-tube procedure. It means a small scale study pre- 
ceding the final or larger survey. 


Influence of the Question Form on Poll Results 


As a result of the “test tube” procedure and other tests with the 
wording of questions, it soon became obvious that the wording of ques- 
tions did have an influence on the answers which people gave when inter- 
viewed. Many experiments with this phenomenon were made or described 
between the years 1938 and 1941 by Roper (20), Blankenship (1), Roslow, 
Wulfeck and Corby (21), Link and Freiberg (16), Link (15) and others. A 
remarkable outcome of these experiments was the discovery that, even when 
questions were deliberately worded to obtain different results, the differ- 
ences were often of a minor nature. If a question was worded simply 
enough so that it could be understood, and if it dealt with a problem on 





232 Henry C. Link 


which people were well informed, the exact wording of the questio, 
seemed to be of secondary importance. 

There is at present a discussion among psychologists bearing op this 
subject. McNemar, in his review of “Opinion-Attitude Methodology” 
(19) draws the conclusion that attitude scales consisting of multiple qui ” 
tions all bearing on the same subject give a more reliable result than 
single questions. Therefore, he recommends that “single questions 
opinion gauging be discarded in favor of opinion measurement by atti- 
tude scales.” Ina reply, Leo P. Crespi of Princeton University points out 
the demonstrated validity of “single question” results (6) and gives 
supporting evidence. Enough evidence now exists, in the author's 
opinion, to justify the conclusion that single question results, both on 
factual matters and matters of opinion, can have as high a reliability 
and validity as the results of multiple question or attitude scales. This 
evidence strongly supports the classical concept of the psychology of 
attitudes, namely, that an attitude represents a nexus or a fund of ideas. 
impressions and feelings, which tends to express itself in much the same 
way to a variety of stimuli. Certainly this would explain the similarity 
in the answers which differently worded questions on the same basi 
subject so often produce. 

Another aspect of this problem is the unsettled controversy over the 
informal depth interview which may take three or four hours and the 
formal interview with definitely worded questions which may be asked 
and answered in afew minutes. Outstanding among the expanents of the 
former is the psychologist, Paul F. Lazarsfeld, who made extensive use 
of depth interviews in commercial surveys in Vienna in the years around 
1930, and in various types of research in the United States since 1935 
(12). The writer is among those who, while using depth interviews in the 
exploratory stages, maintains that a single definitely worded question can 
often evoke answers as valid as can be evoked by a whole series of ques- 
tions on the same subject (18). 


International Polls of Public Opinion 


In 1937 the first results of the British Institute of Public Opinion 
appeared. The British. Institute was inaugurated under the direction 
of Dr. Geroge Gallup as an affiliate of the American Institute of Public 
Opinion. Since then the following national polls have begun operation: 
Canadian Institute of Public Opinion; Australian Public Opinion Polls; 
Institut Francais d’Opinion Publique; Norsk, Gallup Institutt; Svenska 
Gallup Institutet; Dansk Gallup Institut; Suomen Gallup Oy; and In- 
stituto Brasileiro de Opiniao Publica e Estatistica. All of these public 
opinion institutes have been inaugurated under the inspiration of Dr. 





Some Milestones in Public Opinion Research 233 


Callup and the American Institute of Public Opinion, of which they are 

affiliates. However, each operates independently in its own field. The 
results of several of these institutes appear periodically in the Public Opin- 
m Quarterly. 


The Scientific Institute of Mexican Opinion 


In 1942 Professor Laszlo Radvanyi organized the Scientific Institute 
{ Mexican Opinion. This Institute is making periodic surveys in the 
Federal District and in the large cities of the country. In 1944, Dr. 
Radvanyi founded the Institute for Studies in Social Psychology and 
Public Opinion, which makes regular international surveys among profes- 
sional people, psychologists, economists, educators. The Institute is 
now preparing an International Directory of the names prominent in 
opinion and attitude research. 


The National Opinion Research Center 


In 1941 the National Opinion Research Center was established at 
the University of Denver. It was and is financed by the Field Found- 
ation, Inc., and the University of Denver, but otherwise operates much 
like a commercial organizatian. Its periodic polls of public opinion are 

| 


released to the newspapers but its reports may be obtained regularly by 
anyone at stated subscription rates. During the war and following, the 
NORC did considerable research for government agencies on a contrac- 
tual basis. Probably its most important contribution to research tech- 
niques consists of a series of studies in 1942 and 1943 on the wording of 
questions. 

Other universities have also established bureaus for opinion or con- 
sumer research, including the Bureau of Applied Social Research at 
Columbia University, under the direction of psychologist Paul F. Lazars- 
feld, and quite recently, the Survey Research Center at the University of 
Michigan under the direction of psychologist Rensis Likert. 

A complete account of milestones in this field will probably include 
the work of Elmo Roper on people’s knowledge of the subject matter on 
which they were being questioned; the development of sampling methods; 
the studies on interviewer bias; a considerable range of surveys made 
by psychologists, sociologists and political scientists for government 
agencies, and the Army and Navy during the war. It will also include 
an account of the poll of experts developed by Dr. Arthur W. Kornhauser 
and published in The American Magazine during 1945 and 1946, as well 
as the poll of experts conducted by Emily L. Ehle under the title, “The 


American Leadership Panel,’ and issued as a service obtainable on 
subseription. 





Henry C. Link 


References 


. Blankenship, A. B. The influences of the question form upon the respons 

public opinion poll. Psychol. Rec., 1940, 3, 349-422. 

. Blankenship, A. B. Consumer and opinion research. New York: Harper & Bros 

1943. 

. Blankenship, A. B., editor. How to conduct consumer and opinion research, Nox 

York: Harper & Bros., 1946. 

. Brown, T. H. Use of statistical techniques in certain problems of market res 

Cambridge: Harvard University Press, 1935. 

5. Cantril, H. Gauging public opinion. Princeton: Princeton University Press 

. Crespi, Leo P. Opinion-attitude methodology, and polls—a rejoinder. Ps 

Bull., 1946, 43, 6, 562-569. 

. Freiberg, A. D., in the summary of panel discussions held on the twentieth anpi- 

versary of The Psychological Corporation. J. appl. Psychol., 1942, 26, 10-23 

. Gallup, George and Rae, Saul Forbes. The pulse of democracy. New York: Simor 
& Schuster, 1940. 

Jenkins, J. G. Dependability of psychological brand barometers: I. The pro! 
of reliability. J. appl. Psychol., 1938, 22, 1-7. 

Jenkins, J. G., and Corbin, H. H. Dependability of psychological brand ba: 
eters: II. The problem of validity. J. appl. Psychol., 1938, 22, 250-252. 

. Lazarsfeld, P. F., Berelson, B., and Gaudet, H. The people’s choice. New Yor 

Duell, Sloan and Pearce, 1944. 

. Lazarsfeld, P. F. The controversy over detailed interviews—an offer for negot 

tion. Publ. Opin. Quart., 1944, 8, 36-60. 

. Link, H. C. A new method of testing advertising effectiveness. Harv. bus. | 

1933, 11, 165-177. 

. Link, H. C. A new method for testing advertising. J. appl. Psychol., 1934 

1-26. 

. Link, H. C. How many interviews are necessary for results of a certain accuracy 

J. appl. Psychol., 1937, 21, 1-17. 

5. Link, H. C., and Freiberg, A. D. The problem of validity vs. reliability in pul 

opinion polls. Publ. Opin. Quart., 1942, 6, 87-98. 

. Link, H. C. The new psychology of advertising and selling. New York: Macmillar 

1932, Chapter 3. 

. Link, H.C. An experiment in depth interviewing on the issue of internationalis! 

vs. isolationism. Publ. Opin. Quart., 1943, 7, 267-279. 

. MeNemar, Quinn. General review and summary of opinion-attitude methodology 

Psychol. Bull., 1946, 43, 4, 289-374. 

. Roper, E. Wording of questions for the polls. Publ. Opin. Quart., 1940, 4, 129 

130. 

. Roslow, S., Wulfeck, W, H., and Corby, P. G. Consumer and opinion resear h 

experimental studies on the form of the question. J. appl. Psychol., 1940, 24 

334-346. 

. Thurstone, L. L. Attitudes can be measured. American J. Sociol., 1928, 33 

529-554. 


144 
444 





Postscript to Predicting Success in Machine Bookkeeping * 


Edward N. Hay 


The Pennsylvania Company, Philadelphia 


[in December 1943, a successful experiment in validating tests for 
predicting success in machine bookkeeping for a large commercial bank 
vas reported.! Since 1941, when that experiment was completed, re- 
neated measures have been made of the performance of individual book- 
keepers. The data show that there has been a substantial improvement 
n production year by year, as a result of the employment of operators 
selected by test. Table 1 gives the detailed record chronologically. 


Table 1 


Production Rates of Machine Bookkeepers 
Burroughs Index of Production Rate 





Production 


October 1937 43 105.0 
November 1939 108.7 
April 1940 30 109.7 
December 1940 110.2 
December 1941 26 110.9 





December 1943 29 116.7 
June 1944 35 113.0 
December 1945 29 108.1 
June 1946 29 111.6 





N in each case is the number of bookkeepers who have been on the 
machine at that time for one year or longer. The decline in the per- 
lormance from a high of 116.7 in 1943 to a low of 108.1 in December 1945 
vas the result of a smaller supply of applicants with acceptable test 
scores. The subsequent increase to June 1946 of 111.6 reflects the im- 
provement in the supply of higher test score operators since the end of the 
war. 


Received February 10, 1947. 


* This is an “early publication,’ the author paying complete costs. 
‘Hay, Edward N. Predicting success in machine bookkeeping. J. appl. Psychol., 
1943, 27, 483-493. 


235 





The Reliability of Visual Acuity Scores Yielded by 
Three Commercial Devices * 


J. H. Sulzman, M.D. 


Troy, New York 


Lt. Comdr. E. B. Cook, H(S) USN 


Medical Research Laboratory, New London, Conn. 


and N. R. Bartlett 


Johns Hopkins University 


Compact optical devices for measuring visual acuities and muscle 
imbalances are now available commercially. These instruments are 
suited especially to situations where time and space factors are at a pre- 
mium. The U. 8. Navy had need for such devices. Accordingly, the 
writers have evaluated the Keystone ‘“Telebinocular,” the American 
Optical “Sight-Screener’ and the Bausch and Lomb “Ortho-Rater’” for 
their relationship to prescribed Navy procedures. Details of the in- 
vestigation have been described in a report (1) to the Bureau of Med- 
icine and Surgery of the U.S. Navy. This note presents a summary oi 
reliability data for the acuity scales of the instruments, together with 
information on the correlation of these measures with acuity scores with 
standard Snellen-type letters. 

Each of the three instruments affords a score for acuities at the 
optical equivalent for remote distance and at the normal distance em- 
ployed in reading. Examinations for acuity at proximate distances have 
not been a general Navy selection practice, so in this report there is no 
prescribed standard test for short distance with which to compare the 
instruments. However, it may be assumed that miniatures of the charts 
used for acuity at twenty feet could be utilized as comparable acuity 
measures at the reading distance, and that assumption was made for 
this study. Photographic miniatures of certain distance charts were 
used at a reading distance of fourteen inches to derive a score for proxi- 
mate acuity. 

* This evaluation was conducted at the Medical Research Department, U. 58. Sub- 
marine Base, New London, by Lt. Comdr. Sulzman, Lt. Comdr. Cook, atu Lieutenant 
Bartlett, U. 8S. Naval Reserve. Opinions expressed in this paper are personal views 0! 
the authors, and are not to be regarded as expressions for the Navy Department. 

236 





Reliability of Visual Acuity Scores 237 


The consistency of measurement with any test depends upon the 
ietails of the method and upon the competence of the operator. Changes 
‘, method might be developed which would improve the reliability 
‘any one of the devices. But in order to evaluate each as used in 
-outine practice, the procedure employed for each instrument was in 
every detail that prescribed by its manufacturer. Then especial effort 

s made to ensure that operators never deviated from these procedures. 

[his point is significant because there are many details that a careless 
erator may neglect which have an important bearing on the reliability 
{the data obtained. It was assumed that in the Navy acuity examina- 
tions are assigned ordinarily to enlisted Hospital Corpsmen. Hence, in 
this evaluation the operators were enlisted personnel; but it is empha- 
sized that the operators were selected, trained thoroughly and supervised 
closely. 
The tests under investigation were given twice to a representative 
group of observers. The population was selected so that its variability 
in visual performance would approximate that encountered in routine 
Navy medical examinations. The total was comprised of 88 Navy en- 
listed personnel, 20 Civil Service employees, and 20 high school (twelfth 
grade) students. Probably the variability is less and the mean acuity 
greater than in general industrial practice; if so, the reliability data in 
this paper are unduly conservative insofar as industrial application is 
concerned. But in any case, the variability represented should be com- 
parable to that for any large unselected group of Navy personnel, and 
it was for application to such a group that the study was initiated. 

Care was taken before every test to ensure that the observer under- 
stood exactly what he was to do. No extra practice was allowed, how- 
ever, beyond that involved in the standard instructions. The schedule of 
tests was adjusted to randomize any practice or fatigue effects from one 
instrument to another. Some small practice or time effect does exist, 
for a consistent tendency for the retest scores to be a little improved 
over the test score is apparent in the results. This tendency may be a 
specified practice effect, or it may be an artifact of testing-time sequence. 
All initial tests were given in the morning, and all retests in the afternoon 
of the same day, so the difference in time of day might account for the 
changes in scores. No experiments to isolate the basis for the slight 
improvement were conducted. 

Many methods for measuring visual acuity might be utilized. How- 
ever, the medical profession ordinarily employs some procedure involving 
charts of letters graduated in size. Thus the use of letter charts is 
specified in the Manual of the Navy Medical Department. Furthermore, 
there is a tradition for acuity charts to be made up of Snellen-type char- 





J. H. Sulzman, E. B. Cook and N. R. Bartlett 


acters. These facts dictated the choice of a chart-test with Snelley 
letters as the typical sample of accepted practice against which to weig) 
the new methods. The procedure for viewing the letters was that s 
cified by the Army-Navy-OSRD Vision Committee (2). 

Two sets of Snellen letters were employed. One, referred to in thjs 
report as “Regular Snellen,” was a set stocked routinely by the Nayy 
Department, and purchased from an approved supply firm. As such it js 
doubtless representative of those in general usage (3), and therefore might 
be considered the comparison reference for evaluating any instrument 
designed to supplant chart procedures. The other set was constructed 
locally; in this report it is referred to as the “Improved Snellen’’ set 
Its three charts differed from the three drawn from the stockroom in 
that there were a few more letters used, they were reproduced with more 
precision, and letters were better distributed for difficulty. Inspection 
of the data for individual letters indicates room for considerable further 
improvement. Nevertheless, as will be obvious from the reliability 
coefficients listed below in Table 1, the charts constructed locally allowed 
an acuity measure that is appreciably more consistent from one test to 
another than are the measures with those from the medical supply firm 

Table 1 presents means, standard deviations and reliability coeffi- 
cients for each of the tests. Monocular data only are involved in these 
figures. For each test, the datum for each of the 256 eyes was the 
decimal equivalent for the Snellen fraction for threshold resolution. 
Threshold with charts was defined as the line immediately larger than 
the largest line with two or more errors; with the instrument tests, 
threshold was the target object immediately larger than the first two 
successively miscalled. 

According to Table 1, each of the three instruments yields data with 
at least as much reproducibility as may be expected with the “Regular 
Snellen” charts. But no procedure tested is more reliable for deter- 
mining acuity for remote distance than the ‘Improved Snellen,” a method 
utilizing more nearly adequate test charts. The evaluation for proxi- 
mate distance acuity measures does not allow such clear-cut statements. 
Both the Ortho-Rater and the Sight-Screener furnish near acuity scores 
that are more consistent from test to retest than are those for the minia- 
ture Snellen charts when the latter test is adapted for measurements for 
proximate distance. 


Y 
e- 
p 


Table 2 throws some light on the question of whether the instruments 
measure the acuity function that has been accepted in common medical 
usage. In order to obtain an index for that acuity, the mean of the test 
and retest readings with the “Improved Snellen” charts was computed 
for each eye. The degree of relationship between this index and the 





Reliability of Visual Acuity Scores 


Table 1 


Test-retest Consistency Data for Monocular Acuities 


Test Standard Retest Standard  Test-retest 
Name of Test Mean Deviation Mean Deviation Coefficient 





Remote Distance 
Improved Snellen* 0.995 0.387 1.000 0.394 0.877 
Regular Snellen 1.179 0.437 1.201 0.444 0.797 
Ortho-Rater 1.054 0.279 1.081 0.246 0.850 
Sight-Screener 1.042 0.340 1.066 0.339 0.844 
Telebinocular 1.041 0.367 1.047 0.384 0.813 
roximate Distance 
Improved Snellen* 0.747 0.255 0.777 0.261 0.754 
Ortho-Rater 1.034 0.230 1.066 0.221 0.848 
Sight-Sereener 1.028 0.302 1.055 0.327 0.765 
Telebinocular 0.869 0.211 0.916 0.211 0.708 





*Eye-chart distance for “Remote Distance” is twenty feet, and for “Proximate 


Distance” is fourteen inches. 


score in the first test with each device was then assessed. Results are 
presented in Table 2. For purposes of this evaluation, these expressions 
of relationship are validity coefficients. It will be noted that the cor- 
relation ratio for “Regular Snellen”’ is larger than that for any instrument. 
This finding, in the light of the relative magnitudes of the test-retest co- 
eficients shown in Table 1, is of interest. It is obvious that another 
tabulation, with coefficients corrected for attenuation, would indicate 
much more validity for the “Regular Snellen” test than for any of the 
instruments. The differences might be due to either one or both of two 
possible causes. First, the lens system in the optical devices may intro- 
duce some element in simulating long distance that is not present when 
an observer actually regards a chart from a distance of 20 feet, or second, 


Table 2 
Coefficients of Correlation of Monocular Acuity Scores with the Mean of Two 
Readings on the “Improved Snellen” Charts 


Proximate Remote 
Test Distance* Distance* 





Regular Snellen - 0.82 
Ortho-Rater 0.70 0.72 
Sight-Screener 0.64 0.74 
Telebinocular 0.59 0.58 





* The coefficient of correlation between the near and far “Improved Snellen’’ scores 
ior the 256 eyes represented is 0.56. 





240 J. H. Sulzman, E. B. Cook and N. R. Bartlett 


perhaps there is some difference in the types of visual target character: 
in the various tests that may give rise to some perceptual differeneo 
among observers. Both factors probably operate. Thus, the Sighs. 
Screener and the Telebinocular have letter-type characters somewha; 
similar to the Snellen letters, and yet the coefficients for the Sight-Screeno; 
manifest the effect of some extra factor almost to the same degree as dp 
the figures for the Ortho-Rater. So the writers reason that difference 
in test characters cannot account for the entire effect, and assume that 
the lens systems for simulating long distance may not be altogethe; 
successful with all observers. 


Received July 15, 1946. 


References 


1. Visual acuity measurements with three commercial screening devices. Progress 
Report No. 2 on Research Project X-493 (Av-263-p) of the Bureau of Medicin 
and Surgery, U. S. Navy Department, February 1946. 

2. Testing visual acuity: manual of instructions. This manual was developed by t 
Subcommittee on Procedures and Standards for Visual Examinations of th 
Army-Navy-OSRD Vision Committee. 

3. Feree, C. E., and Rand, G. A. A new visual acuity and astigmatism test chart 
Am. J. Ophthal., 1937, 20, 21-32. 





Industrial Test Norms for a Southern Plant Population * 


George K. Bennett and Alexander G. Wesman 
The Psychological Corporation, New York City 


National norms for psychological tests are for some purposes the most 
valuable; they are the standard to which the greatest number of institu- 
tions can properly refer. In most cases, however, the usefulness of gen- 
eral norms is diminished by the differences in composition between the 
croup to which the tests have been given and the national group on which 
the norms are based. Selective factors in a specific population bring about 
these differences. For a given population, the most meaningful norms 
are local norms—norms based on a group of people most similar to the 
group to which the tests are being given or to the group for which the 
persons tested are applying. When a particular organization or institu- 
tion has a sufficiently large population of its own to yield reliable norms, 
these are undoubtedly best and should always be obtained. If the local 
population is not sufficiently large to yield reliable norms at a single 
testing, but adequate data can be accumulated over a reasonable period 
oi time, such data should be gathered and used. 

Where an organization or institution is not large enough to obtain 
reliable local norms in either of these ways, the most profitable course is 
toseek out such published or otherwise available data as pertain to groups 
which most closely resemble the local population. The authors hope that 
the data included herein will prove useful for this purpose.' 

The norms presented have been derived from a population of adult 
white applicants for employment in a large southern industry. The 
positions sought by the applicants are divided into office and plant work. 
Separate groupings have been made according to this classification as 
wellas sex. The tests include: Bennett Mechanical Comprehension Test 
Form AA; Revised Beta Examination; Hand-Tool Dexterity Test; and the 
Minnesota Vocational Test for Clerical Workers. As in most industrial 


*The test program from which these data were obtained was installed by the Indus- 
‘nal Division of The Psychological Corporation. The authors are indebted to Mr. 
Richard Fear of that division and to Mr. Laurence Ross of the Union Bag and Paper 
\o., Savannah, Ga., for the test scores on which this paper is based. 

. ‘The writers urge that other professional test people who have access to similar 

data publish them directly, or make them otherwise available to test publishers for 

‘issemination to test users. Such publication would do much to enhance the usefulness 
' psychological tests for both employment and guidance purposes. 


241 





242 George K. Bennett and Alexander G. Wesman 


organizations, not all applicants were given all these test. The practi 
rather than the “pure research,” approach was necessary. The dat 
should be found nonetheless useful, however, since the selection process 
probably resembles that which would be used in any similar industy 
organization. 


Test of Mechanical Comprehension Form AA 


The Bennett Mechanical Comprehension Test was administered + 
1637 male applicants for plant jobs and 100 male applicants for offic 
work. Table 1 presents the score distributions for these two groups. 


Table 1 


Bennett Mechanical Comprehension Test Form AA 
Scores and Percentiles of 1,737 Male White Applicants for Plant and Office 
Positions in a Southern Factory 


Plant Office 
Applicants Applicants 
(N = 1637) (N = 100) 

Score Percentile Score 





55 99 59 
50 95 57 
46 90 51 
43 80 49 
70 45 

44 


43 


27 
21 33 
19 24 


15 20 
6 11 


33.07 Mean 40.65 
10.74 8.D. 10.56 





Correlation with Revised Beta = .56 (Plant applicants only). 
Correlation with Hand-Tool Dexterity = .28 (Plant applicants only). 


together with means, standard deviations and specified percentile equiv- 
alents. At the bottom of Table 1 correlation coefficients between this 
test and the Revised Beta Examination and Hand-Tool Dexterity are 
shown. 

One of the interesting features of Table 1 is the fact that the appl- 
cants for office jobs are consistently superior on the test to the plant job 
applicants. It may be that the office applicants are a superior group 





Industrial Test Norms 


general ability, and that these test scores simply reflect that fact. It 
may also be that they are people whose mechanical comprehension is 
their strongest feature and who should be considering plant rather than 
ofice jobs. No data are available to illuminate this question, but the 
plant reports that some of the men included among the “office appli- 


ants” were actually candidates for supervisory mechanical jobs. It 
might be noted parenthetically, that at each of the designated percentile 
ints, the scores of these applicants fall between those of the twelfth 
erade students and the engineering school freshmen reported in the 
manual. The plant applicants fall generally between the tenth and 
eleventh grade students. 


Table 2 
Hand-Tool Dexterity Test (Large Model) 
Scores Equivalent to Designated Percentile Points for 1,123 Male White 
Applicants for Southern Plant Jobs 


Minutes Seconds Percentile 





12 9Y 
39 95 
58 90 
12 85 
23 


34 
44 
53 
02 
10 


19 


29 
41 
54 
07 
23 


Mean 
8.D. 





Vorrelation with the Test of Mechanical Comprehension = .28 (Plant applicants only). 
rral; . : ‘ : . ea : 
Vorrelation with Revised Beta Examination = .36 (Plant applicants only). 





George K. Bennett and Alexander G. Wesman 


Hand-Tool Dexterity Test 


The Hand-Tool Dexterity Test was administered to 1123 male white 
applicants for plant jobs. The score distribution, mean, standard deyig. 
tion and specified percentile points are presented in Table 2. As is to be 
expected from a timed test of this sort, the distribution is skewed toward 
the higher (poorer) scores. Since sectional cultural differences are prob. 
ably relatively unimportant in the skill measured by this test, the dis. 
tribution may be reasonably acceptable as a depiction of national norms 
The correlation of Hand-Tool Dexterity with Mechanical Comprehension 
and Revised Beta is also shown in Table 2. The small size of the coeff- 


Table 3 


Revised Beta Examination 
Scores Equivalent to Designated Percentile Points for 1,362 Male and 
1,083 Female Applicants for Southern Plant Jobs 








Men Women 


Score Percentile 





115 99 
108 95 
103 90 
99 85 
96 80 


93 75 
91 70 
88 65 
86 60 
84 55 


RSREE 
or aoc uw 


hasad 


31 19 


80.54 Mean 72.91 
17.66 8.D. 17.52 





Correlation with Hand-Tool Dexterity Test = .36 (Male plant applicants only 7 
Correlation with the Test of Mechanical Comprehension = .56 (Male plant pp! 
cants only). 





Industrial Test Norms 245 


sients speaks well for the usefulness of Hand-Tool Dexterity in combin- 
ation with other measures for the prediction of success in manual jobs. 


Revised Beta Examination 


lhe Revised Beta Examination was administered to 1362 men and 
\)83 women applicants for plant jobs. The scores equivalent to specified 
yercentile points, means and standard deviations for each sex are pre- 
opted in Table 3, together with the correlation of Beta with Hand-Tool 
Dexterity and Mechanical Comprehension for the men only. The norms 
e particularly valuable since they are the most extensive data which 
have become available to the publishers thus far, in spite of the wide use 
this test. Sectional differences should be taken into account, how- 
ever, in using Table 3 for normative purposes. 


Table 4 
Minnesota Vocational Test for Clerical Workers 
Score Equivalent to Designated Percentile Points for 162 White Female 
Applicants for Office Work in a Southern Plant 


Score on Part I Score on Part II 
(Number checking) Percentile (Name checking) 





162 99 173 
154 95 155 
140 90 143 
129 85 138 
122 132 


117 f 125 
115 121 
112 dé 115 
109 5 111 
108 107 


103 102 


100 100 
98 97 
96 95 
94 91 
90 88 
87 86 
83 82 
78 77 
70 71 
57 47 


108.10 
26.44 








George K. Bennett and Alexander G. Wesman 


Minnesota Vocational Test for Clerical Workers 


This test was administered to 162 female applicants for office work jy 
the same organization. Data are presented in Table 4 for Part I (nym. 
ber checking) and Part II (name checking). No correlational data aro 
available for this group. The scores of these women most closely y 
semble those of the ninth grade girls presented in the manual for the tes; 


Received June 24, 1946. 


References 


Bennett, George K. Manual for Test of Mechanical Comprehension Form AA. Noe, 
York: The Psychological Corporation, 1940. 

Bennett, George K. Manual for Hand-Tool Dexterity Test. New York: The Psy- 
chological Corporation, 1946. 

Andrew, Dorothy M. and Paterson, Donald G. Manual for Minnesota Vocational Test 
for Clerical Workers. New York: The Psychological Corporation, 1933. 

Kellogg, C. E. and Morton, N. W. Manual for Revised Beta Examination. N 
York: The Psychological Corporation, 1935. 


ew 





A Note on Adapting the Minnesota Rate of Manipulation 
Test to Factory Use 


M. N. Oxlade and K. F. Walker 


Industrial Welfare Division, Commonwealth Department of Labour and 
National Service, Sydney, Australia 


In a recent investigation into the use of aptitude tests for the selec- 
tion of women for cotton textile spinning,' a modified form of the Minn- 
sota Rate of Manipulation Test was tried out. While the modifications 
introduced were not found to improve the test, evidence was found in 
support of the tentative conclusions reached by Wilson,? who found that 
if the score on the first part of the test (Placing) was taken as the sum 
of three trials, or the best of three trials, the results obtained were prac- 
tically identical with those given by the orthodox scoring method. Sim- 
ilar data were collected on the second part of the test (Turning). 


Table 1 


Correlations Between Various Scores in Rate of Manipulation Tests 


Scores Correlated Placing 


Turning 





Sum of 4 trials and sum of 3 trials +.95 +.007 + 96 
Sum of 4 trials and best of 4 trials +.91 +.01 + 9] 
Sum of 4 trials and best of 3 trials +.90 +.01 87 
Sum of 3 trials and best of 4 trials +.97 +.007 88 
Sum of 3 trials and best of 3 trials +.88 +.02 87 
Best of 4 trials and best of 3 trials +.91 +.01 +-.90 
Sum of 4 trials and sum of 2 trials +.91 +.01 + .92 
Sum of 3 trials and sum of 2 trials +.95 +.01 + .94 


oie i ir iF ir ir 





Product-moment correlations (N = 67) were calculated for each 
part of the test between the sum of four trials (orthodox scoring) and 
the best of four trials, the sum of three trials, the best of three trials, 
and the sum of two trials; coefficients were also computed between the 
sum of three trials and the best of four trials, the sum of three'trials and 
the best of three trials, the best of four trials and the best of three trials, 
and the sum of three trials and the best of two trials. The coefficients 
and probable errors are shown in Table 1. 

‘Reported in Bulletin of Industrial Psychology and Personnel Practice, Dept. of Labor 
and National Service, 1946, Vol. 2, No. 2. 

, * Wilson, G. M. and Staff. Adapting the Minnesota Rate of Manipulation Test to 
factory use. J. appl. Psychol., 1945, 29, 346-349. 


247 





248 M.N. Oxlade and K. F. Walker 


It will be seen that for both Placing and Turning, all the coefficient. 
are high and significant. In Placing, the use of the sum of three trig) 
would give results nearest to those obtained by the orthodox method 
although it should be noted that the coefficient is equally high betweep 
the sum of three trials and the sum of two trials. This suggests that 
if time were very limited, satisfactory resu/ts might be obtained by using 
the sum of only two trials. The results for Turning are similar. §|} 


rt 
LOTT- 


ening of the test to two trials, however, might lower the reliability 
the test and it would probably be safer to retain the third trial. 


Received June 5, 1946. 





The Superiority of College Students on the Minnesota 
Rate of Manipulation Test * 


Harold G. Seashore’ t 


The Psychological Corporation, New York City 


The Minnesota Rate of Manipulation Test! was included in a battery 
of motor skill tests administered to a group of 100 college students. The 
Placing and Turning tests were given according to Ziegler’s manual: 
practice trial, followed by four trials. The students came in groups of 
two, three, or four. Each man completed his first trial before the first 
man started the second trial, and soon. The subjects first completed the 
Placing Test and then the Turning Test. The audience never included 
more than the men being tested and the examiner. The men waiting 
their turns were not allowed to talk to the examinee nor to give any social 
motivation beyond that of their being present, quietly waiting a turn. 

When the data had been gathered it was observed that these men 
rated extremely high on the norms given in Ziegler’s first manual. At 
once the hypothesis was suggested that, since Springfield College men 
ire chosen for college partly on the basis of superior motor coordinations 
for physical education and the various crafts which are associated with 
recreation and camping, they could be expected to test high in the arm, 
hand, and finger motions of the test. To check this idea, Dr. Howard 
White arranged for the testing of 48 sophomores at the University of 
Maine who were not physical education men nor engineers. 

The data are presented in Tables 1 and 2. In Table 1 it is seen that 
the Springfield and Maine samples have practically the same means and 
standard deviations. The normative means, as presented in the 1946 
manual, are considerably larger. The college men perform on the Rate of 
Manipulation Test with greater speed than the normative sampling of 

*This article is a “prior publication,” the author paying complete costs. The 
scheduled 80 pages per issue is thereby increased by the corresponding amount; thus 
the “early publication” of this article is a direct contribution to the subscribers of the 
Journal of Applied Psychology without handicap to those authors whose articles are 
accepted and printed in their regular turn. 

+ The research was completed in the Department of Psychology, Springfield College, 
in the years 1939-1940, 

‘Ziegler, W. A. Minnesota Rate of Manipulation Test—Manual and Equipment. 


Mi sa. Fi : : 
‘Minneapolis: Educational Test Bureau. No date given. In 1946 a new manual was 
published with revised norms. 


249 





Harold G. Seashore 


Table 1 
Distributions of the Springfield and Maine Men on the Minnesota 
Rate of Manipulation Tests 





Average 
Seconds Placing Turning 
for 4 i eminent a inn tea oma — ac Seana a Ha 
Trials Springfield Maine Springfield Maine 





31 
32 
33 
34 


37 
38 
39 


aconowns = 


— 
or 


~ » “I © 


41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 


55 


w 


| vo 
- We > 


7 


e 
— 
mm Oe & Tw hw oO eS OT 


ww 


56 

57 

58 

59 

60 

61 

62 

N 5 48 
Mean , S 42.06 
8.D. .96 ! 4.09 
50% ile score 

(a) Original Manual* 

(b) 1946 Manual t¢ 


mal wownsn 








* Manual in use in 1940. 
t Manual published in 1946. 





College Students on Rate of Manipulation Test 


Table 2 
Number and Percentage of Springfield and Maine Men in Each Quintile of the 
Norms on the Placing and Turning Tests 


Placing Turning 


Springfield Maine Springfield Maine 








Oo ry ry ry ¢ 
c 4 ( 


Upper fifth . ; 83 ( 3: 67 
Second fifth : E : 19 
Middle fifth ; 7 : 14 
Fourth fifth : : 0 
Lowest fifth 0 


N 4 96 


men in general. The manuals do not present the standard deviations, 
it they can be estimated? from the normative tables (1946): Placing, 
:pproximately 4.7 seconds; Turning, 4.2 seconds. 

It was thought that perhaps the social motivation of having two to 
four men rotating trials might have accounted for the efficiency of the col- 
lege men. The test manual provides some indirect evidence on this 
possibility. Norms are provided for “group testing,” a situation in 
which several persons simultaneously take the test on duplicate sets of 
test equipment. On each trial the subjects are begun together. This 
“group testing”’ situation surely should provide an opportunity for social 
competition to operate. However, the table of norms shows small 
lifferences in median scores for group testing and individual testing. It 
appears that the normative standard deviation is a little greater for group 
testing. However, the norms are so similar that the outstanding superi- 
ority of the college men can hardly be accounted for on the basis of simple 
social competitiveness in the presence of others taking the test. 

In Table 2 the scores secured by the men in the two samples are 
sorted into quintiles on the 1946 norms to show how vastly superior 
these college men are to the presumably normal sample upon which the 
norms are based. 

Since this research was completed in 1940, several papers involving 
the test have appeared. Some are included in the 1946 manual. Two, 
not included, are reported here to reinforce the point just made about the 


e . . +4 ) i 
*The standard deviation was estimated by the formula ¢ = xa —— 
-00e 
10 % percentile range. Kelley, T. L., Statistical method. New York: The Macmillan 
Vompany, 1923, p. 104. 





252 Harold G. Seashore 


superiority of the college men. Wilson® presents 63 cases and it appears 
that his median score was 57.0 seconds on the Placing Test. His mediay 
percentile rank was 58 on Ziegler’s original norms and 77 on the 1949 
norms. These factory workers were better than the normative groyps 
but not as good as the college men studied herein. 

In another investigation‘ in which the Minnesota Test was included 
as one of several dexterity tests, the authors present mean scores for men 
and women. The median education was high school graduation and 
median age was 18 years9 months. On the Placing Test, the mean scores 
of the 49 boys and 35 girls were at the 58th and 50th percentiles, respect- 
ively; on the Turning Test, they rated at the 55th and 50th percentiles. 
respectively. This group of noncollege, young people achieved scores 
quite similar to the medians on the 1946 norms. 

That the testing was reliably done is shown by the coefficients of 
reliability in Table 3. The values have been corrected by the Spearman- 


Table 3 
Reliability of the Placing and Turning Tests 








Placing Turning 
Trials Combined* r r 





2 and 3 vs. 4 and 5 78 .79 Springfield 
2 and 5 vs. 3 and 4 .78 85 Springfield 
2 and 4 vs. 3 and 5 .80 .90 Springfield 
2 and 4 vs. 3 and 5 .80 , Maine 





* Trial 1 is practice; trials 2-5 are scored. 


Brown formula. The 1946 manual reports reliability coefficients of .87 
and .91 for Placing and Turning, respectively. The range of scores pre- 
sumably was greater in the studies from which these values were derived 

The Placing and Turning tests are substantially related as indicated 
by r’s of .58 and .46 for the Springfield men and Maine men, respectively. 

The tests seem to be unrelated to scholastic aptitude. Scores on the 
A.C.E. Psychological Examination correlated .02 with Placing scores 
and .20 with Turning scores of the 96 Springfield men. Within the re- 
stricted range of college-men, the motor functions of the Rate of Manip- 
ulation Test and mental ability are not related. These data of course do 
not demonstrate whether or not success on these tests is related to mental 
ability over the range of adult intelligence one meets in the employment 


* Wilson, G. M. Adapting the Minnesota Rate of Manipulation Test to factory 
use. J. appl. Psychol., 1945, 29, 346-349. 

‘Steel, M., Balinsky, B., and Lang, H. A study of the use of a worksample. / 
appl. Psychol., 1945, 29, 14-21. 





College Students on Rate of Manipulation Test 253 


stuation. Widespread evidence from other studies, however, would 
1use one to discount scholastic ability as an important variable in per- 
formance on the Minnesota Rate of Manipulation Test. 
An hypothesis not investigated by the writer is that college men score 
relatively high because of generally higher motivation, apart from the 
This 
hypothesis could stand up even though the manual for the test shows 
nly slight differences in norms for singly-tested persons and group-tested 
persons. This hypothesis regarding generalized differential motivation 
n test-taking situations deserves further study. 


ceived March 10, 1947. 


ite 





The Improvement of Performance on the Minnesota Rate of 
Manipulation Test When Bonuses are Given * 


Harold G. Seashore { 
The Psychological Corporation, New York City 


The first sentence in the manual for the Minnesota Rate of Manip. 
ulation Test! reads as follows: ‘‘The Minnesota Rate of Manipulation 
Test is designed to measure native speed capacity.’’ Later in the sam: 
paragraph the following statement is made, “Basic rate of manipulation 
is primarily a native trait, improvable only to a very limited extent.” 

These statements seem to be excessively dogmatic with reference to 
the improvability of motor performances on manipulative tests like thi 


one. ‘Two plans of testing the statements were suggested. First, on 
might try a simple experiment in which repeated trials on the test wer 
secured from a variety of subjects under very general conditions of moti- 
vation. Second, one might attempt to secure improvement with a spe- 
cialized form of motivation, namely, bonuses for improvement. Thi 
latter plan was adopted, since it was felt that applicants for employment 
who were being tested by such a test would be in a highly motivated sit- 
uation; they would have much to gain by good performances on the test 

The sophomore psychology class had already taken the test under 
standard conditions as described in another research.? Seven students 
were accepted as volunteers from this class. Their average score fo1 
four trials on the Placing Test was taken as the base line for computing 
bonuses and penalties. 

Only the Placing Test was used. For economy, a daily test consisted 
of three trials without practice, rather than one practice plus four scored 
trials. Each trial was timed to the closest one-tenth second. A subject's 


* This article is a “prior publication,” the author paying complete costs. Th 
scheduled 80 pages per issue is thereby increased by the corresponding amount; thu 
the “early publication” of this article is a direct contribution to the subscribers of t! 
Journal of Applied Psychology without handicap to those authors whose articles are 
accepted and printed in their regular turn. 

t This study was completed in 1940 while the author was professor of psychology 4! 
Springfield College. 

1 Ziegler, W. A. Minnesota Rate of Manipulation Test. Manual. Educational 
Test Bureau, Minneapolis. 

2Seashore, H. The superiority of college students on the Minnesota Rate of Ms- 
nipulation Test. J. appl. Psychol., 1947, 31, 249-253. 


254 





Test Performance Improvement under Bonus 255 


laily score was the average of the trials. Tests were given on nine days 
or a three-week period with from two to four of the men coming in to- 
vether for their tests. There always was rivalry even though each man 
-gs competing for bonus money only against himself. To add to the 
cial stimulation, the men posted their scores on the blackboard and 
noted the bonuses and penalties in a column next to the scores. Their 
issmates in general psychology talked frequently about the progress of 
{ the experiment. 

The bonus consisted of twenty-five cents for each full second the 
subject could reduce his daily score over his best daily score, In a few 
ses, the quarter was paid if he came within one tenth of a full second 
fimprovement. The highest daily bonus was $1.50 paid to Subject F on 
the first day. In order to keep the men pressing for gains, they were pe- 
nalized modestly in the form of a ten-cent fine if on any day their score 
for the day was as much as one-tenth of a second poorer than for the 
ist preceding day. A total of sixteen dollars in bonuses was earned and 
the penalties totalled two dollars and fifty cents. The amounts were 
paid out at the end of the experiment. Individual earnings are shown 
n Table 1. On the first day, Subject D declared that his goal was to earn 
enough to buy a pair of spring slacks; he earned the most money. 


Table 1 


Scores for Ten Days of Seven Subjects on the Minnesota Rate of 


Manipulation Test under Bonus Conditions 


Initial 
Score 45.75 49.50 53.00 , 54.75 57.00 59.00 


Day 1 52.56 50.56 49.26 51.33 51.00 59.76 
43.06 52.83 46.60 47.16 49.06 51.96 56.13 
40.86 48.73 45.33 42.96 47.70 51.2% 51.13 
41.90 50.40 46.90 48.40 50.60 50.70 49.96 
39.90 47.13 44.53 40.06 46.23 46.96 47.46 
42.40 44.90 45.13 10.64 45.80 47.60 45.50 
38.90 45.73 47.78 40.76 44.80 47.90 47.70 
42.00 46.70 48.23 42.50 45.46 47.33 43.50 
42.67 42.20 44.50 40.5% 44.56 47.40 49.90 


$1.10 Ry i35 1.35 3.10 1.80 2.20 3.20 
Initial Percentile Rank 99+ 99+ 95 91 89 79 64 
inal Percentile Rank 99+ 99+ 99+ 99+ 99 + 99 + 99 


Bonus 





The essential data are presented in Table 1. The average time in 
seconds for the trials on each day is presented; the initial score, secured 
velore the subjects were told that they were to participate in a bonus 





256 Harold G. Seashore 


study, is given in the top line. At the bottom of the table the amy int 
of money earned in bonuses is given. The initial percentile rank and the 
final percentile rank are also shown.’ 

A word should be said about the subjects. It had already been 
demonstrated that nearly all of these students were superior on the Mip. 
nesota Rate of Manipulation Test when given under standard conditions 
without any unusual motivation. In the next to the last line in Table | 
the percentile rank of the seven subjects for their initial scores is given, 
Two ranked at the 99th percentile, three were between the 89th and 95t} 
percentiles, one was at the 79th; only one, at the 64th percentile, could 
be considered a “‘poor’’ performer in relation to his fellow students. al. 
though even he was above the median for adults in general. In short. 
the subjects were a superior group, and it could be expected that im- 
provement in performance would be more difficult with a group who were 
already approaching the highest performances secured in the standard- 
ization of the Minnesota Rate of Manipulation Test. If their scores 
really represented limits ‘‘of their ability,” as the manual suggests, then 
later scores generally would be poorer because of the regression factor 
One could assume that with subjects well below average greater possi- 
bilities of improvement could be expected. 

Considerable improvement occurred in the Placing Test during the 
nine practice periods of three trials, each under the conditions of bonuses 
and personal rivalry. Every man made a considerable reduction in the 
average number of seconds required for trial. Subject D, for example, 
showed an improvement of about fourteen seconds, reducing the time 
from fifty-four seconds to forty seconds. Subject A, who had the best 
performance initially, showed the least gain, reducing his time about three 
seconds, but it is noted that on the fifth and seventh days he performed 
considerably better than on the eighth and ninth days. The final line 
in the table shows that six of the subjects could be given a percentile 
rank of 99+, while one subject just managed to secure the rank of 99. 

Table 2 presents the ranking of the men at each day of the experiment 
together with summary ranks at the end of the first five days, at the end 
of the second five days, and at the end of the whole period. Learning did 
not occur at an equal rate for each subject, as shown by the changes in 
rank from day to day, and when the ranks based on the first five days are 
compared with the ranks based upon the second five days. At the end of 
the experiment the original first four men in rank were again the first four 
in rank, with small changes in order. At the end, the three slowest men 
were still the three slowest. However, if the experiment had stopped 


3 The ranks in this table are read from the 1946 edition of the manual for the test 
which replaces the manual existing at the time of the experiment. 





Test Performance Improvement under Bonus 


Table 2 
Ranks of the Subjects for the Ten Days with Summaries for the First Five Days, 
the Second Five Days, and All Ten Days 


B C D 
Initial 
Rank 








Day 1 


days* 


Days 5-9 
Ten days 


* Initial day and experimental days 1 to 4. 


ne day earlier o1, for that matter, on any other day, considerable discrep- 
ancies in rank would have been apparent. 


Discussion 


Rarely does anyone assume today that a vocational aptitude test 
measures “native capacity’”’ apart from experience. We have gone far 
beyond that in our thinking. However, we still make another assumption 

which may be just about as bad—that the subjects have had equal or 
nearly equal experience, and that rates of learning of a skill or work- 
methods can be neglected. The person scoring higher on a test is judged 
to be the better performer within the limits of reliable measurement of 
the test. We tend to forget that a low-ranking person might with smaller 
amounts of further training actually surpass persons now exceeding his 
performance. 

Four variables appear to be vital: native capacity which we cannot 
measure well and probably do not need to measure; present status of 
aptitude (including established work-methods) which we can measure 
juite readily; rate of learning in the particular skill, which variable we 
tarely bother about measuring; and motivation. To secure optimum 
ieasurement for the purposes of appraising an individual’s motor ability, 


( 


or selection or guidance, we should secure adequate ratings based upon 





258 Harold G. Seashore 


present performance, rate of learning, and motivation. It probably wi 
be a long time before we can measure adequately in this manner. We 
nevertheless should be aware of the problem and set our research goa)s 
in the direction of its solution. 

To illustrate this matter factually, note that Subject A was the first. 
ranking in the initial testing, but on the ninth day was third-ranking 
However, his over-all picture shows him to be the first-ranking person a 
the end of the study. Subject D was fourth-ranking at the beginning 
but showed considerable improvement and never thereafter ranked lowe 
than third in the group from day to day, and at the end his average per. 
formance placed him in second place. At the end of nine days D was g 
better performer than Subjects B and C, particularly in day-to-day sta- 
bility of performance, as seen by inspection of the changes in rank from 
day to day of these men. Other personnel variables being equal, D prob- 
ably should be given employment preference over B and C, and he prob- 
ably is as good a man as A.‘ The standard industrial use of the test, 
however, would have rated him fourth. Needless to say, these samples 
are too small to permit elaborate statistical determination of signifi 
cance, and the above discussion is primarily to point out the nature of 
data which are important in the proper description of performance on a 
motor ability test. 

Since men want the job they are applying for, and since they should 
be trying to do their best, it is usually assumed that their motivation i 
equal. However, we usually neglect the fact that a person with a faulty 
work method® may be barred because of its effect on his score, but he 
might gain very rapidly once his bad habits are pointed out to him; brie! 
tutelage may make great differences in scores. To assume that good 
performers will discover their own faulty movements promptly is con- 
trary to what is known about time and motion studies. Subject D had 
a specific challenge to earn a pair of trousers. Even with the motivation 
of a big gain on the first experimental day, Subject F never really did get 
into the spirit of the experiment and was most difficult to get to come on 
time. Subject G had faulty methods of work which irritated the other 
men as they watched him work, but there was a gentleman’s agreement 
not to talk over with each other any techniques of handling the blocks 

After the formal experiment was over Subject A, the best man, said 
he wanted to beat his record, and thought he could do it by performing 
several trials consecutively with a few moments of rest between trials 


‘This assumes validity of the test for the personnel functions involved, but this 's 
not the issue in the present report. 

5 Seashore, R. H. Work habits: a neglected factor underlying individual differences 
Psychol. Rev., 1939, 49, 123-141. 





Test Performance Improvement under Bonus 259 


He completed seven trials, then rowed with the crew for one hour and 
returned to do three more. His times were: 43.35, 39.0, 37.3, 42.0, 35.6, 
70, and 37.0; after rowing, 41.5, 42.3, and 36.6. The last three trials 
efore rowing average better than his best average daily score (38.90) he 
yas able to show under bonus conditions. 

Mention should be made again of the fact that improvement noted in 
‘his study was made by subjects who already had been superior perform- 
ers and the inference is that considerably greater improvement could be 
expected from persons who on first testing showed average or inferior per- 
lormances. 

A large-scale study involving the whole range of ability seems indi- 

ited. It may well be that in a larger-scale study the correlation of rank 
secured on the first standard testing and rank secured after a training pe- 
riod will be so high that the initial testing is a satisfactory predictor. 
However, in making such a study, one should, if possible, control another 
ariable, namely, the role of tutelage. It might be possible that final 
rank and initial rank will not correlate highly under all conditions; for 
instance, under conditions where each individual is given the necessary 
tutelage to remove obviously faulty work habits. 

The problem of constructing tests which will get at the really impor- 
tant variables in industrial performance is obviously difficult and the 


tests which really will predict employee production best probably will 
not be as economical or neatly arranged as those now available. It is 
important that in the long run we discover testing procedures which 
will yield information about these more significant variables. Native 
capacity we probably will not measure; present status we can measure 
readily. Rate of learning (and the function of tutelage during training) 
and motivation are variables which need more experimental study. 


Received March 10, 1947. 





Value of Color in Advertising 


Lucien Warner 
LIFE Magazine, New York City 


and Raymond Franzen 


Consultant, New York City 


Many tests of the value of color in advertising have been reported 
and the findings are conflicting. 

Rather than become embroiled in consideration of the relative im- 
portance of pure recall, aided recall, recognition, visibility, prestige value, 
affective value, attention value, fixation time, confusion effect and so o; 
in the present study we took off from the simple question: “What does 
the advertiser want his advertisement to do?”’ 

For some the answer is that he wants to create an association in th: 
consumer’s mind between the product and his brand name. To the man 
who has a new brand to sell an advertisement which leaves a brand im- 
pression on more people than competing advertisements is to that extent 
more successful. 

Therefore, in one test the respondent was exposed to advertisements 
in a setting which resembled the usual exposure of advertisements to 
magazine readers as closely as is possible in an interview situation. This 
was followed by an interval to represent the period between the seeing 
of an advertisement and the occasion to buy the product. During it th: 
respondent’s mind was taken off the advertisements he had seen, by ques- 
tioning. He could hardly have been deliberately attempting to retain 
the brand names in his mind because he was unaware that he would late: 
be quizzed. He could not have gathered the purpose of the study from 
the interviewers since they were uninformed. Finally the respondent 
was read a list of products and asked to reply, if possible, with the brand 
names promoted by an advertisement he had been shown. This process 
presumably represented the buying situation, where the customer feeling 
the need of a product does (or does not) call to mind a trade name. 

Experimental situations never perfectly mimic corresponding real- 
life situations and this test is no exception. It would have been better if 
the seeing of the advertisements had been spontaneous rather than re- 
quested, if the interval had been longer and if the naming of a brand 
name had been, instead, the actual purchase (or failure to purchase 


260 





Value of Color in Advertising 261 


of a branded article. Recognizing these and other limitations of the 
study, we must still admit that the test pits 4-color and black and white 
advertisements against each other on fairly equal terms. We can look 
upon the data in hand as the basis for an estimate of the relative impact 
of these 4-color and black and white advertisements. 

But this test is particularly related to the function of only one kind 
of advertising. A large part of the advertising in today’s magazine is of 
trade names familiar to all. Most people asked to name a gum will men- 
tion Wrigley’s. The word Ford is very definitely associated with an 
automobile. The product-brand name association could hardly be more 
firmly established. What the advertiser must do with non-users of his 
brand is to create or increase the following association: brand name-pres- 
tige. In other words, the advertiser must transform mere familiarity with 
the brand into real “I want it.’”’ With the people who already own and 
use his brand, the advertiser wants to create or preserve, and, if possible, 
to enhance this association: brand name—pride of ownership. The owner 
must be made to talk about his possession to his less fortunate friends. 
Also he must be made a repeat buyer. 

While the advertisement that aims to promote a new brand must 
primarily pound home the trade name, the advertisement which seeks to 
heighten the luster of prestige and quality already associated with a fa- 
miliar brand must, primarily please, attract, interest. So the second of 
the two factors measured in the survey was the ability of an advertisement 
to intrigue or interest a reader. This was done by asking the observers 
to indicate as they paged through a binder of full-page advertisements 
any which particularly interested them. 


The Sample 


For every 4-color advertisement promoting a given product used with 
one sub-sample of 496 people, a corresponding black and white adver- 
tisement was used with a matching sub-sample of 496. Thus, compar- 
ative measures of the two advertisements were obtained. One thousand 
interviews were assigned but, actually, each sub-sample was 4 interviews 
short. 


The one-thousand were assigned in the manner indicated in Table 1. 
The obtained sample is given in Table 2. Education was unassigned 
but came out as shown in Table 3. 

The distribution of education was about what we would expect since 
we omitted the D economic group. Obtained quotas for sex, age and eco- 
homic status approximated those assigned. 





Lucien Warner and Raymond Franzen 


Table 1 


Intended Distribution of Sample in Each of Ten Cities 


Sample A Sample B 
Men Women Men 


Wome I 








Sex and Age 
20-29 
30-44 


45 and over 


Economic 





A 
B 
C 
D 





Table 2 


Actual Distribution of Sample in Each of Ten Cities 





Sub-Sample A 


Sub-Sample B 





Age 





29.7% 20-29 
39.0 30-44 
31.3 45 and over 





Sex 





Male 
Female 





Economic 


20.2 A 
36.3 B 
43.5 C 

D 





N.A. 








Value of Color in Advertising 


Table 3 


Obtained Distribution by Education Level 


Sub-Sample B 
6 N.A. No Schooling Pe 
11.0 Grammar School only 16.7 
56.5 High School 52.3 
31.9 College 30.8 


Sub-Sample A 





The ten cities were selected to provide a spread in type (industrial, 
irket center, institutional) and in size: 


Los Angeles, California Dover, New Jersey 
Buffalo, New York Raleigh, North Carolina 
Urbana, Illinois Fargo, North Dakota 
San Antonio, Texas Chicago, Illinois 
Altoona, Pennyslvania Hartford, Conn. 


A filter question eliminated from the sample all people who said they 
had not looked through a copy of one of the three leading general weeklies 
luring the past 3 or 4 months. The 992 people who replied ‘‘Yes’’ to 
this question were invited to go through a book in which were bound 
twenty full-page advertisements from recent issues of a general weekly 
with very high circulation. 


Selection of the Material 


Color and black and white should be represented by equally good 
advertisements. Since the creators of advertisements are concerned 
with other matters than producing equivalent 4-color and black and white 
material for test purposes one might argue that it would be best to create 
synthetic advertisements for the purpose. This, however, would be most 
unrealistic and would yield results of more interest to an academician than 
te a business man. 

In the present study, therefore, such uniformity in advertising excel- 
lency as was achieved derives from the assumption that there is a uni- 
formly high level of creative ability applied to the production of full-page 
advertisements recently appearing in a popular mass magazine having a 
very wide circulation. 

At first thought one might suppose that the highest degree of com- 
parability would exist between two advertisements exactly alike in lay- 
out wording, picture, size, etc., and differing only in the dimension of 
color. Actually we have good reason to believe that this is not true. 
Experts in the field tell us that an advertisement created with only black 





264 Lucien Warner and Raymond Franzen 


ink in mind is inevitably different in many respects from one created fo; 
4-color. A layout of maximum effectiveness for the one would almos 
always fall short of maximum for the other. It seemed wise for us to fo). 
low the experts in this matter and to select as most nearly comparable 
two advertisements prepared by the genius of a single advertising agency. 
advertising the same product in the same medium to promote the sam 
trade name. Therefore, these advertisements were selected arbitrarily 
as follows: , 


Starting with the most recent issue of a mass weekly magazine and working 
backwards through four months, and in each case starting with the last page 
of an issue and working toward the front, all full-page advertisements in 4-color 
or black and white were arranged in order. The pairing of advertisements 
was determined by taking the most recent black and white and the most recent 
4-color advertisement which promoted the same product (or service) and the 
same trade name. Thus the experimenter exercised no judgment in the selec- 
tion of an individual advertisement. 


The above procedure yielded ten pairs of advertisements, which ful- 
filled our requirement. Each pair consisted of a full-page black and white 
and a full-page 4-colored advertisement promoting the same product 
under the same trade name. Here are the products: 


An electric lamp A soap 

A sheet A soft-drink 

A cigarette An electric blanket 
A whiskey A wristwatch 

A railroad A dentrifice 


One member of each pair was bound in the folder used with half the 
sample, and the other in the folder used with the other half. Five 4-color 
and five black and whites were included with each sample. Further selec- 
tions were made on the same arbitrary basis to provide two full-page ad- 
vertisements in 4-color and two in black and white advertising the same 
product (or service) but under different trade names. 

Five such quartets were found: Four makes of aeroplane; four makes 
of fabric for clothing; four makes of medicines; four makes of phonograph 
records; and four makes of cosmetics. 

Each half of the sample was tested on one black and white and one 
4-color advertisement of each of the products. Thus a total of 20 adver- 
tisements, half of them colored, was used with each respondent. Note 
that this selection was strictly arbitrary and automatically ruled out any 
prejudice or any flaw in judgment which might have influenced a selec- 
tion made by the free choice of the investigator. 

The sequence of the 20 advertisements in the binder was arranged in 
accordance with the following rules: 





Value of Color in Advertisin 265 
q 


|. Black and white and 4-color alternated except that the regularity was 
broken by introducing two advertisements consecutively 4-color or black and 
shite at five points in the series. This was done so as to avoid the impression 
that black and white advertisements were being compared with 4-color. 

9. Advertisements promoting relatively costly, permanent acquisitions 
were irregularly mixed in with those promoting inexpensive quickly consumed 
items. 

3. Similar irregularity governed the sequence of items which were one of a 
pair and those which were one of a quartet. Of the latter, the two belonging 
to the same quartet and, therefore, advertising the same product were sep- 
arated by at least four intervening advertisements. 


Adjustment for previous exposure to advertising and thus possible 
variation in brand name familiarity was made in the ten cases where com- 
peting advertisements promoted different brands.' This was done by 
the insertion in the following correction formula of the false mentions 


of a brand by respondents in that sample to which that brand was not 
shown: 


Adjusted % of sample correctly naming brand 
_ (Obtained % of correct mentions) — (% false mentions) 
on 100% — (% of false mentions) 





Results 


The study gives two measures of 4-color versus black and white: The 
comparative tendencies of the two to arouse interest in an advertisement 
and their comparative tendencies to be recalled. 

Some respondents named only two or three advertisements as being 
interesting; some mentioned many. Obviously the value of a mention 
depends upon the number so named. We have, therefore, treated sepa- 
rately the number of respondents naming 0, 1, 2, 3, etc., up to all 20 in- 
teresting. Because the distribution of the two samples among the group- 
ings according to number of advertisements named as “‘interesting’’ is 
not identical, we weighted the raw figures in terms of the size of groups. 
Thus we compared the group in Sample A naming five advertisements as 
interesting, with the group in Sample B naming five advertisements. 
But we did so not in terms of how many in the one mention the black and 
white cigarette advertisement and how many in the other named the 
+color cigarette advertisement. Rather, we used percentages in each 
case. This was a defensible procedure, but it failed to yield a single 
index of the relative “interest value’ of the two members of a pair. To 
arrive at such a figure we first determined the similarity in judgment 
among the several groups and discovered a reasonably close relationship 
among respondents who name as interesting over three and fewer than 
\6 of the 20 advertisements. The remaining groups behaved ecceniri- 


‘This adjustment is the same as the confusion control used by D. B. Lucas. 





266 Lucien Warner and Raymond Franzen 


cally. We, therefore, combined the findings for the twelve groups naming 
from four to fifteen advertisements inclusive, by securing the algebra; 
sum of the differences in per cents naming the 4-color and the black and 
white advertisement in each pair. 

Actually, inclusion of the few individuals naming more than 15, o; 
fewer than four advertisements does not appreciably alter the findings 


Table 4 


The Mean Difference in Expressed Interest Between Black and White and 4-Color 
and an Estimate of the Error of this Difference 








Mean Mean Difference in 
Difference Multiples of its o* 





A. Differences in favor of 4-color 





Fabric A 36.6 
Medicine A 33.4 
Railroad 31.8 
Cosmetic A 19.8 
Blanket 19.7 
Aeroplane B 18.0 
Whiskey 15.5 
Fabric B 14.6 
Record A 13.1 
Cosmetic B 12.6 
Medicine B 10.3 
Record B 9.0 
Cigarette 8.8 
Sheet 7.5 
Dentrifice 4.6 
Soft drink 3.3 
Watch 3.3 





B. Differences in favor of black and white 





Electric lamp 3.6 1.59 
Aeroplane A 4.1 1.73 
Soap 8.2 3.04 





* For each pair the differences in the 12 groups (mentioning four, five, six, et 
were averaged and the o of these 12 differences was computed. This o was divided 
by vn to give an estimate of the o of the mean difference. Each mean difference was 
then divided by the o of that mean difference. 


The combined results are given in Table 4. In the first column 1s 
given the difference in per cent of people naming the 4-color and the 
black and white advertisement of each pair. The base in every case is 
the number of people tested on the advertisement.? In the second col- 


2 In pairs marked A the 4-color member was shown subsample A and in pairs marked 
B the 4-color member was shown subsample B. 





Value of Color in Advertising 


imn is given the ratio of average difference to the o of the average dif- 
ference, aS an indication of the degree to which the weighted difference, 
is tenable. 

The impact data were treated in the same way. Table 5 gives the 
results. 

Table 5 
The Mean Difference in Expressed Impact Between Black and White and 4-Color 
and an Estimate of the Error of this Difference 





Mean Mean Difference in 
Difference Multiples of its o* 





A. Differences in favor of 4-color 





Railroad 20.7 
Fabric A 19.0 
Medicine A 15.8 
Cosmetic A 15.4 
Record A 14.3 
Cosmetic B 14.1 
Fabric B 9.8 
Cigarette 8.1 
Electric lamp 7.8 
Blanket 7.0 
Record B 5.5 
Aeroplane B 3.3 
Dentrifice 1.8 
Soap 1.0 42 





B. Differences in favor of black and white 





Sheet 4 
Soft drink 1.0 
Whiskey 4.6 
Watch 5.0 2.21 
Medicine B 10.0 3.29 
Aeroplane A 11.3 3.95 





*For each pair the differences in the 12 groups (mentioning four, five, six, etc.) 
were averaged and the o of these 12 differences was computed. This o was divided 
by vn to give an estimate of the o of the mean difference. Each mean difference was 
then divided by the o¢ of that mean difference. 


Obviously when the black and white and colored members of a pair 
are compared the latter, in most cases, has the advantage in both interest 
and impact value. When judged by the ratio of difference to error esti- 
mate, the advantage seems to be greater in the case of impact. One 
might ask, how does the factor stand out among the many others which 
are undoubtedly related to an advertisement’s effectiveness? 





268 Lucien Warner and Raymond Franzen 


A certain degree of uniformity exists among the advertisements con. 
pared in this study. Any two promote the same product and were pre- 
pared by the same advertising agency. In half the cases they promote 
the same trade name. All were full-page advertisements financed hy 
firms which had bought space in a magazine with a very large circulation 
and audience, and which were, therefore, under the same pressure to utij- 
ize the space to advantage. All appeared within a four-month period, 
The only factor deliberately and constantly contrasted was that of 4-color 
versus black and white. Nevertheless, other factors, uncontrolled and 
quantitatively unidentified, did vary. In some cases one member of a 
pair had more text than the other. In all cases the wording was some- 
what different as was the illustration. In a few the “appeal” was dif- 
ferent. It is possible, in the case of half of the advertisements used(the 
quartets) to estimate the total effect of these uncontroled factors upon 
the interest and recall values. It will be remembered that each quartet 
consisted of two 4-color and two black and white advertisements, al] 
promoting the same product. We can hold the color factor constant by 
comparing the two 4-color members of each quartet with each other, and 
by comparing the two black and white members with each other. 


Table 6 


Comparison of Interest Indices Where Quartets were Used 








4-Color versus 4-Color 
and 
4-Color versus Black and White b. & w. versus b. & w. 








Mean Diff. Mean Diff. 

Mean in Multi- Mean _ in Multi- 
Differ- ples of Differ- _ ples of 
ence its o ence its o 





Fabric A 36.6 11.35 Fabric, color 35.7 8.10 
Fabric B 14.6 2.94 Fabric, b. & w. 13.6 3.05 


Medicine A 33.4 11.72 Medicine, color 28.6 10.13 
Medicine B 10.3 2.33 Medicine, b. & w. 5.5 99 


Cosmetic A 19.8 6.89 Cosmetic, color 8.0 
Cosmetic B 12.6 4.56 Cosmetic, b. & w. 8 


Record A 13.1 4.06 Record, color 5.5 1.68 
Record B 9.0 3.38 Record, b. & w. 9.6 3.79 


Aeroplane A 4.1* 1.73* Aeroplane, color 4.6 1,28 
Aeroplane B 18.0 3.79 Aeroplane, b. & w. 17.5 4.89 





* Difference in favor of black and white. ll other differences in these two columns 
favor color. 





Value of Color in Advertising 269 


Table 6 presents the interest values and for comparison the 4-color 
versus black and white values are repeated. 

Judging by the ratios of mean difference to the o of mean difference, 
we find that four of the five quartets contain a pair wherein the 4-color 
superiority is greater than the advantage of one or the other member in 
either ease where color is compared to color, and black and white to black 
and white. One quartet, aeroplanes, contains a black and white adver- 
tisement more superior to the other black and white than either difference 
in the color tests. In interest value, then, color usually outweighted 
other factors. 

This is not true in the case of impact, however. Impact values 
similar to interest values in Table 6 are given in Table 7. 


Table 7 


Comparison of Impact Indices Where Quartets were Used 








4-Color versus 4-Color 
and 
b. & w. versus b. & w. 


-Color versus Black and White 


Mean Diff. Mean Diff. 
Mean in Multi- Mean in Multi- 
Differ- ples of Differ- ples of 


ences ences its o 








Fabric A 19.0 4,2: Fabric, color 22.4 6.41 
Fabric B 9.8 ‘ Fabric, b. & w. 11.6 3.04 
Medicine A 15.8 


Medicine, color 45.2 10.06 
Medicine B 10.0* 


Medicine, b. & w. 20.3 8.65 
Cosmetic A 15.4 


Cosmetic, color 10.3 1.68 
Cosmetic B 14.1 


Cosmetic, b. & w. 10.8 3.67 
Record A 14.3 ; Record, color 17.4 5.64 
Record B 5.5 Record, b. & w. 23.0 5.18 


Aeroplane A 3.3 
\eroplane B 11.3* 


Aeroplane, color 3.9 il 
Aeroplane, b. & w. 18.4 7.81 





* Difference in favor of black and white. All other differences in these 


two columns 
lavor color. 


In impact values it seems clear that the standard differences are usu- 
ally larger when color is compared with color and black and white with 
black and white than they are when color is compared with black and 


white. Apparently the other factors in combination exercise more in- 


fluence than does color in the quartets. It may very well be that this 
would be found true of the pairs also, had we a way to test the possibility. 





Lucien Warner and Raymond Franzen 


Conclusions 

Obviously, the value of color in advertising depends upon a number 
of matters, such as the skill with which it is used, the adaptability of th, 
product to black and white portrayal and so on. 

These tests indicate that a further consideration exists:—the purpos 
of the advertiser. They suggest that in the promotion of a new brand. 
the creation of association between product and trade name, color is not 
necessarily greatly superior. In the protection of an investment in 
familiar brand by keeping alive and increasing its reputation for quality, 
color appears to have a greater advantage over black and white. It is 
possible that careful review of purpose in relation to the added cost of 
color may help to curb a trend toward uncritical selection of expensive 
presentation. 


Received July 22, 1946. 





Studies in Item Analysis: 1. The Effect of Two Methods 
of Item Validation on Test Reliability * 


C. H. Lawshe, Jr. and James S. Mayer 
Division of Applied Psychology, Purdue University 


The problem of selecting the ‘‘best” items from a longer test is one that 
has long confronted both test makers and those who seek to institute 
selective scoring procedures for commercial tests on specific jobs. Since 
the items which make up a test determine both its reliability and its valid- 
ity, there is a need for empirical studies in the field. 

Purpose of the Study. . The purpose of this study was two-fold: (1) to 
compare two methods of item analysis in order to see which gives the 
more reliable test as measured by standard split-half techniques; (2) to 
determine how much more reliable a short test with good items is than a 
longer test composed of good and bad items. 


Procedure 


The Test and Population. Items selected from 300 Elementary Psy- 
chology examination questions were used as a basis for this study. There 
were 517 papers. The items were of the four response, multiple choice 
variety of which only one was considered correct. The questions were 
graded right or wrong and the final score was the number of correct re- 
sponses. 

ControlGroup. The papers were ranked from the highest to the lowest 
on the basis of the total score derived from the 300 items. Every fifth 
paper was removed. These 103 papers were set aside as a control or sec- 
ondary group. 

Selection of Items. After the control group was removed, the highest 
and lowest 27% of the remaining 414 papers on the basis of total scores 
were selected as criterion groups. Each of the 300 items was analyzed 
to find what percentage of the high criterion group gave the correct re- 
sponse. Two indexes of item validity! were determined for each item. 


*The authors acknowledge the aid of William H. Angoff of the Division of Applied 
Psychology, Purdue University, for editorial assistance and for suggesting the method 
f statistical treatment. 

'The use of the word “validity” in this connection has been criticized by some. 
Since no external criterion is available, however, in subject matter achievement tests 
and since the test was constructed so as to sample the field adequately, the term is used 
y lvisedly here. 


271 





272 C. H. Lawshe, Jr. and James S. Mayer 


The first was a correlation method devised by Flanagan (1), employing . 
chart which yields a correlation coefficient estimated from obtained pro- 
portions of the upper and lower 27%. The second method was the PD. 
value method based on Lawshe’s nomograph (3), adopted from Kelley's 
technique (2), and likewise employed the upper and lower 27% based 
upon total score. 

Comparison of Methods. As indicated, both of the methods provid 
estimates of an item’s validity from the proportion of successes in the two 
groups consisting of the upper and lower 27% of the cases on th 
total test score. Somewhat different assumptions underlie the two tech- 
niques, however. The Flanagan r method provides an estimate of the 
degree of relationship between the item and the criterion (total score in 
the present instance) by means of an approximated biserial coefficient 
of correlation. These estimates or approximations are obtained from 
simple table in which the two proportions are located. On the other 
hand, the Kelley technique measures item validity in terms of the dis- 
tance in sigma units between the ordinates which cut off the respective 
proportions of successes in the upper and lower groups. Lawshe’s nomo- 
graph permits the reading of the sigma differences directly from a ecali- 
brated scale once the two proportions are located on the nomograph. 

Both of the methods are intended to produce validity indexes free from 
the effects of item difficulty. As suggested by Flanagan (1), item validity 
and item difficulty are two aspects of item selection which should by 
treated independently. Selection of items by either method results in 
a random choice of items insofar as difficulty level is concerned. 

Time Factor. Although the statistical assumptions underlying the 
two methods differ, the mechanics and the clerical work involved are 
practically identical. One uses a table and the other uses a nomograph 
and the operations involved are so nearly alike that any time difference is 
negligible. In the case of either method the functions of scoring and 
making item counts constitute the bulk of the time required. In the 
ordinary mimeographed test, mechanical layout and arrangement are s0 
important and can be so varied that estimates of time required are diff- 
cult. However, the use of separate answer sheets and the IBM test 
scoring machine greatly. facilitates these operations. Flanagan (1) re- 
ports that to score and item count one hundred and fifty items for 400 
cases with the IBM equipment should require only a little over three 
hours. The time required for processing the data in the present study 
substantiates Flanagan’s statement. 

Reliability Coefficients. After the validity indexes were found, the 
papers in the primary group were set aside and the 103 constituting the 
secondary group were used. Twelve tests were made as a result of the 





Studies in Item Analysis: 1 


tem analysis. Six of these consisted of the best 20, 40, 60, 80, 100 and 
120 items selected on the basis of the D-values. As shown in Table 1, 
.minimum D-value of items selected decreased from 1.4 for the best 
) items to 1.0 for the best 120 items, while the mean D-value decreased 
rrespondingly. By the split-half method, reliability coefficients were 
tained for each of the six tests. 
Another six tests of the same lengths were made from the best items 
selected by the correlation or ‘“‘r’’ method. Table 1 also shows minimum 


Table 1 


Minimum Item Validity Index and Mean Values 


Flanagan “‘r’’ Values D-Values 


Minimum Mean Minimum Mean 


2% Ry} 561 1. 
10) AT 020 

60 45 .502 7 
80 43 486 a 


4 
3 
> 


“NJ ors] 


to tO im i 


ty 
nN ™“ bh 


100 41 471 0 
120 38 .458 0 





’ values ranging from .52 for 20 items to .38 for 120 items with mean 

’ values likewise diminishing with the addition of items. Coefficients 

of reliability were also obtained for these tests. Thus it was possible 

to compare the reliability of a test composed of the best 20 items selected 

‘each method and to make additional comparisons for each of the 

longer tests. 

Results 

Similarity of Selections. The first consideration was the similarity of 

items which both methods selected and results are presented in Table 2. 

The range of agreement was 70% for 20 items to 90% for 100 items. 

lhe seattergram in Figure 1 shows the amount of agreement and dis- 
Table 2 


Similarity and Differences of Items Selected by Two Methods of Item Validation 





Items in Same Different % of Same 


Tests Items Items Items 





14 6 .70 

31 9 ae 

48 12 .80 

68 12 85 

90 10 .90 
107 








. Lawshe, Jr. and James S. Mayer 








“oT AT) +27 
FLANAGAN ¥ 


Fic. 1. Scattergram showing the relationship between D-values and “r’s’’ 
read from Flanagan’s table. 


agreement between “‘r” values and D-values. Although this figure shows 
that there was considerable agreement, Table 3 shows that difference in 
means and standard ‘deviations were obtained in spite of the overlap 
The effects of these differences were reflected in the correlations W 
were obtained. 

Statistical Test of Differences. In examining the differences between 
the reliabilities of the tests of the same length obtained by the two meth 
ods of item selection, Fisher’s z (or z’, as it is sometimes referred to 
values were obtained for the separate correlations, and the tests of differ- 
ence were applied to the z values. It is noteworthy that the test o! 





Studies in Item Analysis: 1 


Table 3 
Means and Standard Deviations of Tests Constructed by Both 
Methods of Item Validation 


D-Value Flanagan r 


Standard Standard 
Mean Deviation Mean Deviation 


20 13.1 4.8 3. 3.0 
10) 29.0 4.7 30. 5.2 
60 43.7 8.0 3.$ 9.0 
80 58.5 10.7 0.5 10.7 
100 73.8 10.4 5. 13.4 
120 90.3 14.8 91. 16.7 
300* 213.7 31.6 : 31.6 








* No selection of items. 


lifferences between the z-values specifies the condition that the correla- 
tions be taken from independent random samples. If the condition is 
t satisfied, the differences which would actually be significant might 
t appear to be so by this test. In the present study the condition is 
tt met, and therefore it is very likely that the D,/odz values given here 
we underestimates of the true actual D,/odz values. 


Table 4 


Correlations and Reliability of Differences 


Stepped up 
Correlations Correlations 


D-Values “r” Values D-Values “ry” Values 





71 70 83 
69 84 81 
16 
80 94 88 
81 
shows p 80 92 88 
nce in 3 80 88 


erlap 





which Results of the Reliability Study. The results shown in Table 4 and 
Figure 2 indicate that the ‘“‘r’”’ method yields approximately the same 
reliability coefficients for the different lengths of tests, while the D-value 
method gives fluctuating reliabilities, but yields increasingly higher 
xd to reliabilities for tests of greater moderate length. 

differ- | While the “r’” method possibly yields somewhat higher reliability for 
of the the short test, the D-value method definitely yields much higher reli- 


twee 


meth- 





276 C. H. Lawshe, Jr. and James S. Mayer 


abilities for longer tests, particularly of 100 items. Here the differenea ;, 
stepped-up reliabilities is .06, a difference which is significant far bevoad 
the 1% level. On the other hand, the only short test which shows » 
statistically significant difference between stepped-up reliabilities in favor 
of the ‘‘r” of the ‘‘r” method is the test of 20 items. Here the D,/q/; 
value is—2.27, which is to say that there are about two chances in oy 
hundred that such a difference could occur as a result of chance factors 
alone. 


100 , 


4 





- 
z 
2 
oO 
we 
w 
WwW 
°o 
Oo 
> 
= 
4 
@ 
<x 
pl 
uJ 
ec 


= 





O° 





© 20 40 60 80 100 120 140 160 {g0 200 220 240 260 280 300 
NUMBER OF ITEMS 

Fic. 2. Split half reliability coefficients (stepped up) for varying length tests com- 

posed of items selected by the two techniques. The dotted line indicates the tests based 


on the Flanagan method while the solid line indicates those based on the D-value 
technique. 


Optimum Test Length. The difference between reliability coefficients 
for 100 items and for 300 items in considering the ‘‘r’’ method is not signi- 
ficant. The difference between the 100 items selected by the D-value 
method and the 300‘items is .07 correlation point, this difference yield- 
ing D./adz of 3.22 which is significant beyond the 1% level of confidence 
In this case, the D-value approach selected a short test that is more re- 
liable than the longer one. 


Conclusions 


Two methods of item validation, the Flanagan “‘r’’ method and the 
D-value method, were subjected to a statistical comparison in order t0 





Studies in Item Analysis: 1 


discover which yielded items that created the more reliable test. The 
following conclusions are supported: 

1. The two methods of item validation identify a high proportion of 
common items, the percentage of overlap ranging from 70 in the 
ease of 20 items to 90 in the case of 100 items. 

2. In selecting 40 and 60 items, the two systems yielded tests with 
reliability differences sufficiently low that they might reasonably 
have occurred through chance. 


in one 


actors 


3. The “r’”’ method selected a 20 item test with a somewhat higher 

reliability than the one of corresponding length selected by the 
D-value method, the probability that such a difference might have 
occurred through chance alone being one in a hundred or even less 
since significance measures are probably under-estimates. 
When the 80 or 100 “best” items were identified by the two meth- 
ods, the coefficients of reliability of the tests selected by the D- 
value method were enough higher that the differences are not 
likely to have occurred through chance. 

. While the “‘r’’ method never produced a shorter test that was sig- 
nificantly more reliable than the total test, the D-value method 
produced a 100 item test with a reliability coefficient so much higher 
than that of the total test that the chances are more-than one in a 


hundred that the difference could not have occurred through chance 
alone. 


Received December 16, 1946. 
Early publication. 


References 


. Flanagan, J. C. General considerations for the selection of test items and a short 
method of estimating the product moment coefficient of correlation at the tail 
end of the distribution. J. educ. Psychol., 1939, 30, 674-680. 
2. Kelley, T. L. Selection of upper and lower groups for the validation of test items. 
J. educ. Psychol., 1939, 30, 17-24. 
icients 3. Lawshe, C. H., Jr. A nomograph for estimating the validity of test items. J. appl. 
L signi- Psychol., 1942, 26, 846-849. 
value . Lindquist, E. F. Statistical analysis in educational research. New York: Houghton 
err a Mifflin Co., 1940, 214-218. 
. d. Peters, C. C., and Van Voorhis, W. R. Statistical procedures and their mathematical 


idence bases. New York: McGraw-Hill Book Co., 1940, 185-188. 


ore Te- 


nd the 
rder 





‘Lhe Ability of College Art Majors to Recombine Ideas 
in Creative Thiaking * 


V. R. Fisichelli and L. Welch 
Institute for Research in Clinical and Child Psychology, Hunter Colleg: 


In a previous study! designed to observe the part played in creatiy 
thinking by the ability to recombine ideas according to plan, one of ¢! 
present authors constructed a special test in which the subject was oblig. 
to recombine familiar ideas according to four different plans. In tha 
study the test performance of unselected college juniors and seniors was 
compared with that of a group of successful professional artists, and 
statistically significant difference in mean performance score was found 
in favor of the professional group. 

The test itself, which is described in greater detail in the study alread) 
mentioned, consists of four separately given parts: (1) constructing meat 
ingful sentences, (2) constructing letters of the alphabet, (3) constructing 
a short story, and (4) constructing pieces of furniture from wooden blocks. 
The specific instructions for each part of the test are as follows: 


Part I. Recombine the words of each group on the next page to make as 
many meaningful, grammatical sentences as possible. For example, here is a 
group of ten words: MEN SKY IS FIGHT THAT THE SLOW 
BRIGHT OF FOR which can be recombined into the following sentences 


Men fight for the sky. 
The sky is bright. 
The fight is slow. 

Ete. 

You will receive as much credit for a short sentence as for a long one 
Your sentences do not have to be artistic, but they must be grammatical. 
There must be at least a subject and a predicate. You will receive credit for 
a sentence which is only slightly different from another. A word from the 
group can be used only once in the same sentence, but it may be use ds al 
number of times in other sentences. Only use words from the group ¢ 
you are examining at the time. You may skip from one group to another i 
you like. 

There are ten of these groups and you have be | ten minutes in whic! 
complete the test. Are there any questions? . . . Do not turn the page unt 
the examiner says “Start.” 

Part II. Make as many letters as possible using no more and no less tha 
three straight lines. For example, the letter A is made with three straight t 


* The authors are grateful to Miss Phyllis Aronoff for her gracious assistance in this 
investigation. 
1L, Welch, Recombination of ideas in creative thinking. J. appl. Psychol., 1% 


30, 638-643. 
278 





Ability to Recombine Ideas in Creative Thinking 


os. two slanting downward and one across. You will be given no credit for 

the letter A, since it is an example. 

Make as many letters as possible using no more and no less than two 

ght lines. 

\ake as many letters as possible using no more and no less than one 
traight line and one semi-circle. 

time limit is three minutes. 
The three separate sets of instructions contained within this part were 
ed on the same page with considerable working space between them.) 
wt III. On the next page you will be given a list of twenty words which 

: are to connect into a story. You must be certain to use the words in the 
ler in which they appear on the list. If the first word on the list is ‘“‘house”’ 
nd the second word is “‘tree,’’ you must first make use of the word “‘house”’ 

ur story and then make use of the word “‘tree.”” You must not skip any 
f the words. 

Your story must be grammatical and logically related. It must have a 
ginning and an end. You will be rated on the number of words you can 
ike use of in the time allotted. Write as fast as you can and underline each 
‘the twenty words as you use it. 

The time limit is three minutes. 

Part IV. The object of this test is to construct out of ten blocks on each 

trial, as many pieces of furniture or home furnishings as possible. The piece 
furniture you construct must fit properly. It must be symmetrical and 
gnizable as a piece of furniture. Do not attempt to be futuristic. Use 
nventional forms. You must use a minimum of two blocks to construct a 
iece of furniture. You can use the same block over again to make another 
ece of furniture. You can make as many of the same type of furniture as 
i like. You will receive full credit for the same type that is only slightly 
liferent from another. 
You have only ten minutes to complete this test. There are five trials; 
nee, you have only two minutes for each trial. 
re is a : : , p 
SLOW The subjects in this investigation were 25 female art majors in their 
tences junior and senior years at Hunter College. They were chosen from the 
upper class years to insure that they had all successfully passed the pre- 
liminary screening courses offered by the art department. These stu- 
dents were, thus, promising young artists with demonstrated talents and 
— special abilities in their chosen field. The first three parts of the test 
natical, were administered to them in group form. The fourth part was given 
adit for individually at a uniform time and under uniform conditions. 
ym the 


Results 


The test results of the 25 art majors in this study were compared with 
those of the two groups, professional artists and unselected upper-class 


students, reported in the previous study. The mean scores and standard 
deviations for each group on each part of the test are shown in Table 1. 

It will be seen that there is a striking difference in the overall score 
between the art majors and the unselected college students. The differ- 
énee between the professional artists and the unselected students has 
already been mentioned in the previous study. The differences between 
‘ne groups on the different parts of the test are not quite as striking. All 





V. R. Fisichelli and L. Welch 


Table 1 


The Mean Performance Scores and the Standard Deviations for Each Group 
on Each Part of the Test 


Professional Unsclected 
Artists Art Majors Students 
N = 30 N = 25 N = 48 


Mean SD. Mean 8D. Mean SD. 





7.2 21.9 6 18.0 

mi 1.9 13.2 1.0 6.7 

III 1. 4.1 7.3 2.5 9.1 
IV 18.4 7.8 13.9 9.1 3.4 


Total Score 60.5 12.3 56.4 15.1 37.6 








of the differences, however, were put to test and some significant f-values 
were obtained. The ¢-values obtained for differences between the means 
of the three groups on each part of the test and between the overall scores 
are presented in Table 2. 


Table 2 


The ¢ Values Obtained for Differences Between Means of Groups 








Professional Professional Art. Majors 
Artists Artists and and 

and Unselected Unselected 

Art Majors Students Students 





> A ig 0.2 _— 
1.5 Fe ae Oe ae 
III sare" 1.1 2.4** 
IV 2.0* 10.2*** a 


Total Score 1.1 Sit ye ag 





*to, = 2.0. 
** te = 2.3. 
“ tor = 2.6. 


It appears that, for the overall test score, the difference between the 
unselected students and the art majors is statistically significant, while 
that between the art majors and the professional artists is not. Parts 
II and IV of the test seem to be especially important. The differences 
between the groups on these two parts seem to be consistently significant. 
On Part II, both the professional artists and the art majors receive sig- 
nificantly better scores than the unselected students. On Part IV, the 
professional artists are the best performers, the art majors second best, 





1 the 
while 
Parts 
ances 
cant. 
> sig- 
, the 


best, 


Ability to Recombine Ideas in Creative Thinking 281 


and the unselected students last. For Parts I and III of the test, which 
require constructions along literary lines, the differences between the 
groups aare not consistently significant. On Part I, the order of best 
performance is art majors, unselected students, and professional artists. 
On Part III the order is professional artists, unselected students, and 
art majors. 

In order to determine the reliability of the method of scoring the test 
three scorers worked independently on all papers. The degree of agree- 
ment between scorers is relatively high. The correlation coefficients 
obtained between scorers ranged from .934 to .978. A list of acceptable 
constructions for Part IV of the test is now being prepared for future use. 
Although there was little disagreement between scorers, most of such 
disagreement was on this part of the test. 

In the previous study a correlation coefficient of .27 was reported 
for the scores on this test and the Wonderlic Personnel Test. In this 
study the test scores were correlated with the general scholastic index of 
the subjects and a product-moment coefficient of .45 was found, which, 
for 25 subjects, is significant at the five per cent level. Performance on 
the test for recombination of ideas appears, therefore, to be related to the 
general intelligence of the art major group in no minor degree. It is to 
be indicated, however, that the unselected students of the previous study 


were not, by comparison with the art majors, a less intelligent group. 
On the American Council of Education Psychological Test the mean 
score for art majors was 114.9 and for the unselected student, 112.7. 


Summary and Conclusions 


The purpose of this investigation was to determine the ability of a 
group of college art majors to recombine ideas in creative thinking. A 
special test, constructed by one of the present authors, was given in which 
the subject was obliged to recombine familiar ideas according to four 
different plans. 


1. The performance of these art majors was compared with that of a 
group of professional artists and a group of unselected college students, 
both of which were examined in a previous study. The mean scores for 
the three groups are as follows: professional artists, 60.5; college art 
majors, 56.4; students unselected for major field of sutdy, 37.6. The 
difference between the unselected students and the art majors is statis- 
tically significant while that between the art majors and the professional 
artists is not. 

2. Parts II and IV of the test seem to have the most discriminative 
value where art is the field in question. For the three groups examined 
ho consistent differences were obtained on the other parts of the test. 


Rory Oo 





282 V. R. Fisichelli and L. Welch 


3. There is some tendency for the total test scores to be related to thy 
general scholastic index of the art major group. . 

4. Finally, it should be mentioned that the function tested in this 
study was strictly a quantitative one, and apart from the fact that the 
recombined ideas of all subjects had to fit the simple criteria of conyep. 
tional usage and symmetry of form, nothing was demonstrated concerning 
the qualities of their production. 


Received June 19, 1946. 





The Occupational Level Scale of the Strong Vocational 
Interest Blank for Men * 


William E. Kendall 


Syracuse University 


The clinical significance of scores on specific occupational scales of 
the Strong Vocational Interest Blank for Men has been the subject of 
letailed discussion by Strong ', Darley? and others. The significance of 
the non-oecupational scales, occupational level in particular, is not clear 

most counselors. Strong, for example, describes occupational level 
OL) in these terms: ‘“‘Men with high OL scores have the interests of busi- 
ness executives and professional men, but those with low scores have the 
interests of workmen’’.® In discussion of the OL scale, Darley defines 
ecupational level as “a quantitative statement of the eventual adult 
evel of aspiration’, represents the degree to which the individual’s 

ital background has prepared him to seek the prestige and discharge 

‘social responsibilities growing out of high income, professional status, 

ognition or leadership in the community; at the lower end of the scale 
he individual’s background has prepared him for the anonymity, the 
mundane round of activities and the ‘followership’ status of a great 
majority of the population.’’ 

Darley further reports “clinical experience together with limited ex- 
perimental data would indicate that the lowest occupational level scores 
n the revised blank will accompany the interest type previously defined 
as “lower level jobs.”” Furthermore, an excessively low occupational 
level score seems at present to be associated with lack of ‘staying power’ 
or ‘survival power’ in college competition.’ 

An interesting observation has been made in connection with a sales 
selection program where a small group of men engaged in technical prod- 

* This paper is one in a series reporting research in tools and techniques of counseling 

ng conducted at the Psychological Services Center of Syracuse University. It is 
being printed as an “early publication,” the author paying complete costs. 

_ ‘Strong, E. K. Vocational interests of men and women. Stanford University: Stan- 
lord University Press, 1943. 

* Darley, J. G. Clinical aspects and interpretation of the Strong Vocational Interest 
B lank New York: The Psychological Corporation, 1941. 

Strong, E. K. op. cit., page 195. 

‘Darley, J. G. op. cit., page 60. 

* Ibid., page 66. 


rmnecrys Or 





284 William E. Kendall 


uct sales for one company were studied. In general, the successfq] salos. 
men appeared to have high OL scores while those men who were yp. 
satisfactory or who later quit after short tryout periods seemed to haye 
low OL scores. While this experience is in keeping with the general state. 
ments quoted previously the question arose as to whether the findings 
were a chance difference or whether the OL score was making a real eon. 
tribution to prediction of success on these selling jobs. 

Unfortunately, in the business and industrial situation it is diffiey|t 
to get large groups sufficiently homogeneous for study. Since this pre 
liminary investigation was to deal with group differences large groups 
were considered necessary. It was thought desirable therefore, to use 
academic records of college students for this analysis. In the college 
situation, the failure of measures of academic ability to predict scholasti 
performance with exactitude is commonly attributed to differential moti- 
vation or other residual variables unaccounted for by the ability measure 
If the OL scale is, in fact, a measure of some aspects of motivation, as 
Darley suggests, then groups selected on the basis of OL score alone 
should differ with respect to scholastic achievement. In one pertinent 
study, Strong® divided 140 students in the Graduate School of Business 
at Stanford University into four groups on the basis of grades over a year 
period. The upper quarter average was only one standard score higher 
than the average of those in the lowest quarter, scholastically. Correla- 
tions between grades and OL score and OL and academic aptitude test 
scores were low. Strong’s finding of homogeneity with respect to OL scores 
for these groups is not surprising when we consider the relatively restricted 
range of ability and of grades in a graduate group. In the present study 
it was felt that more information could be obtainedby taking groups who 
differed with respect to OL score and then comparing academic records. 
The specific problem investigated was this: if academic ability is held con- 
stant, can high, average and low OL groups be differentiated with respect 
to scholastic achievement? 


Procedure 


The Strong blank is administered to all entering male students at 
Syracuse University-as a part of the freshman testing program. In 
September 1946, 1,941 men were tested and from this group, three groups 
of 100 men each were drawn. In Group I were men whose OL raw score 
was 87 or higher (high OL group), in Group II were men whose OL raw 
score was within the range 5 to 27 (average OL group) and in Group III 
were men whose OL raw score was —33 or less (low OL group). The 
median raw score for Groups I, II, and III respectively, is 117, 18 an¢ 
—55. All men used in the study were in the first semester of their first 





Occupational Level Scale 285 


sales. vear. The median age for each of the groups was approximately 21 
re un. vears which reflects the preponderance of veteran students. The pro- 
. have portion of each group in each college of registration, Business Adminis- 
state. tration, Forestry, Fine Arts and Speech is uniform but the proportion of 
dings students in Liberal Arts decreased from high to low OL groups while the 
1 con- proportion of students in Applied Science increased correspondingly. 

The measure of academic ability used was the Ohio State Psycholo- 
ficult vical Examination, Form 21, which is administered as a power test. The 
iS pre- measure of scholastic achievement was first semester hour-point ratio. 


rroups Grades were obtained from the University Registrar and the ratios 
LO use 


Table 1 


-ollege 


lasti Means and Standard Deviations of Ohio Psychological Examination Scores and of 
moti Hour-point Ratios for High, Average and Low OL Groups (N = 100) 
Oll- —— 

oe Ohio Scores Hour-point Ratios 

on, as Mean S Mean S 

alone f : am Ce a : ia ae 

Group I (high OL 96.90 22.76 1.4 .669 

tinen Group II (av OL 89.34 22.82 1.3 .617 

1.1 561 


isiness Gr up Ill dow OL 84.02 24.47 
& year —— acd 
higher Table 2 

orrela- 
le test 


Analysis of Variance of Hour-point ratios* (X) 


scores Source of Degrees of Sum of Mean Test of 
tricted Variation Freedom Squares Square ete Hypothesist 


study 


between means 
98 who of groups 6.141 3.071 7.98 reject 
cords. Within means 


id con- of groups 297 114.357 .385 


espect ‘otal 299 120.498 


* For readers unfamiliar with the details of computation of the analysis of variance 
d covariance, the definitive articles of Johnson and Tsao are recommended. Johnson, 
P.O. and Fei Tsao. Factorial design in the determination of differential limen values. 
nts at Psychometrika, 1944, 9, 107-145; Factorial design and covariance in the study of indi- 
n. In vidual educational development. Psychometrika, 1945, 10, 133-162. 
groups : “7 here F = greater mean square /lesser mean square. By referring to Snedecor's 
ables of F (See Snedecor, G. W., Statistical Methods. Ames, Iowa: Collegiate Press, 
WV score i346, pages 222-225), we may use the following three rules in testing the hypothesis: 
)L raw ‘) reject the hypothesis tested, if the calculated value of F is greater than the 1% point 
yup Ill given in the tables; (b) accept the hypothesis tested, if the calculated value of F is less 
The an the 5% point given in the tables; (c) remain in doubt, if the calculated value of F 
ies between the 5% and 1% points given in the tables. 
a The hypothesis tested is a null hypothesis concerning the difference between means 
' groups, 7.¢., there is no significant difference between the means of groups. 


18 and 
sir first 





286 William E. Kendall 


determined on the basis of A = 3, B = 2, C = 1, D and F = 0 poin:. 
The appropriate statistical technique to handle the problem as st 
the analysis of variance and covariance. 


Ata ; 
ated Is 


Table 3 
Analysis of Variance of Ohio Psychological Examination 


Examination Score (Y) an 


— nn +} 


Source of Sum of Mean 
Variation Freedom Squares Square 





Between means 

of groups 2 8378 4189 
Within means 

of groups 297 33865 551.7 


Total 299 172243 





Table 4 
Complete Analysis of Variance and Covariance 
(Partialling Out the Effect of Academic Ability) 








De- Adjusted 
grees or Re- 
of Sumof Sum of duced 
Source of free- Squares Squares Sum Sum of Mean 
Variation dom = (x) {y) of xy * Squares Square 





Between means 
of groups 6.141 8378 213.6 2.144 1.072 


Within means 
of groups 114.357 163865 1967.5 296 90.734 





Total 120.498 172243 2181.1 298 92.878 





Discussion 


The three groups which were chosen on the basis of OL score wer 
found to differ significantly (P < .001) with respect to scholastic achieve- 
ment as measured by hour-point ratio and with respect to academic 
ability as measured by the Ohio Psychological Examination when aca- 
demic ability as measured was held constant, the groups were found to 
differ with respect to achievement. This difference (.01 > P < .0d) 13 
in the doubtful region and certainly requires further study. 

If we may consider scholastic performance as a function of variables 
such as academic ability, interests appropriate to the curriculum pursued, 
motivation properly directed, etc. then the data of the present study have 
meaning. For example, in the present study if ability were the sole 
determiner of grade record, in the final analysis the groups should not di 





Occupational Level Scale 287 


fer significantly. The difference which did appear is attributed to some 
lifference between the groups which is being measured by the OL scale. 
thout attempting to specify the precise nature of the variable measured 
the OL scale, it would seem that the scale is measuring a variable re- 
| in part to academic ability (for example, see the data of Table 3) 
| in part to motivational factors. 


If used with caution, OL scores at 
e extremes of the distribution should be. helpful to the counselor in 
; judgments concerning individual chances for scholastic success. 


ed February 29, 1947. 


e were 
shieve- 
demic 
nm aca- 
ind to 





Susceptibility to Seasickness: A Questionnaire Approach * 


James E. Birren 


Northwestern University 


and M. Bruce Fisher 


Fresno State College, California 


Selection and placement of men acording to their psychological and 
physiological characteristics have been profitably exploited by many 
branches of the military services. This report concerns a problem of th; 
same sort, the development of a means for identifying those who will by 
most severely affected by seasickness prior to assignment to duty. 

The use of a questionnaire to determine the past history of motion 
sickness was given a preliminary validation in predicting seasickness by 
comparing the questionnaire results on 48 men who had been sent from 
sea to shore duty because of severe and persistent seasickness, wit! 
another small sample with sea experience who had not been so afflicted 
(2). The difference between the questionnaire scores of the two groups 
was highly significant and a more extensive study seemed indicated. 

A criticism frequently directed against questionnaires concerns rr- 
liability, in that the person answering the questionnaire can easily falsify 
his answers. It seemed necessary, therefore, to determine the proneness 
of men to distort their answers when it appears to them that the results 
of the questionnaire might affect their future. This possibility has been 
evaluated as part of the present study. 


Procedure 
Validity. The question of how well a questionnaire-gathered history 
of motion sickness would agree with demonstrated susceptibility to sea- 
sickness was studied in a survey of the officers and men of a destroye! 
escort. The personnel of this ship had had a long enough tour of sea 
duty to know their own tendencies to become seasick as well as those 0! 
their shipmates. Each man in the crew was ranked on his susceptibility 
to seasickness by two persons who made their judgments independent!) 
lhe officers or petty officers making the judgments were those longest 

associated with the person being judged. 
*This material was prepared while the authors were on active duty in the U 8 
Naval Reserve at the Naval Medical Research Institute, Bethesda 14, Maryland. Th 
opinions expressed are those of the authors and not necessarily those of the av) 


Department. 
288 





motior 


1ess by 
it from 


flict ed 
groups 
ed. 

ms re- 
’ falsify 
yneness 
results 


as been 


histor\ 
to sea- 
stroyel 
of sea 
hose of 
stibility 
dent!, 
longest 


he U.S 
nd. Th 
he Navy 


Susceptibility to Seasickness 289 


Fifty-one individuals made a total of 300 judgments as to other per- 
sons’ susceptibility. The greatest number of judgments made by a single 
nerson was 16, whereas 13 observers made only one or two judgments. 
Having such a large number of individuals making judgments tends to 
introduce a lack of consistency in estimations of susceptibility, but, on 
the other hand, it does not overweight the judgent of any single person 
who might have an unusual bias. 

Subsequent to the rating or ranking of the men according to their 
susceptibility to seasickness, and after the rating forms had all been col- 
lected, each member of the crew filled out a questionnaire on his motion- 
sickness history. At all times the men were aware that the study was for 
research purposes only and could in no way affect their naval status; a 
statement to this effect was also included in the heading on the question- 
naire. 

In addition to the questionnaire about past experience with the 
several forms of motion sickness, each person was asked to judge how 
susceptible to seasickness he thought he was. Thus, in addition to two in- 
dependent ratings by his associates or superiors, one self-rating of suscep- 
tibility was secured from each member of the crew. All of the ratings 
were made on a five point scale: (1) never gets seasick, (2) rarely gets 
seasick, (3) occasionally gets seasick, (4) often gets seasick, (5)practically 
always gets seasick. The two independent ratings made possible an 
estimation of the reliability of the judgments. 

Reliability. A group of 544 recruits undergoing processing in a Naval 
Training Center was given the motion sickness questionnaire along with 
the other pencil and paper tests they were required to take.! No men- 
tion was made of the fact that the questionnaire was to be used for re- 
search purposes. Two days later another contingent of 459 recruits was 
given the questionnaire but they were told that it was for research pur- 
poses only and that they were not required to put their names on it. 
These two distributions were compared to determine whether the groups 
differed significantly as a result of the testing circumstances. 

The data of the signed group were analyzed to determine the split- 
half reliability of the questionnaire score. For this purpose, the ques- 
tionnaire items were first ranked on the basis of the number of positive 
answers from the whole sample. The items were then divided into two 
sets, the odd and the even ranking items, the two sets therefore having 
approximately equal totals of positive scores. Half-scores on individual 
questionnaires, as determined by this division of items, were then corre- 
lated by the contingency method. 


‘ Acknowledgment is gratefully extended to Captain C. M. Louttit, USNR, who 
aided in the procurement of subjects for this part of the study. 





290 James E. Birren and M. Bruce Fisher 


Psychosomatic Relations. Although the preliminary study did y,; 
reveal any relation between susceptibility to seasickness and certain py. 
chosomatic complaints, the possibility that such a relation existed seem, 
important enough to require additional verification (2,3,4,). This was 
done by comparing the motion sickness questionnaire scores with score: 
on the Cornell Selectee Index. The Cornell Index was used since it hq 
been validated in induction centers using psychiatric judgment as a crite. 
rion (5). Designed to reveal defects of a neuropsychiatric or psychoso. 
matic nature, it consists of a series of 64 questions requiring a “Yes” 
“No” answer. 

The Cornell Index questions were printed on the reverse side of th 
motion sickness questionnaire and were answered by the recruits in boot 
‘amp. Correlations were determined between the questionnaire scores 
and the Cornell Index for both the signed and unsigned groups. _ In addi- 
tion to these two samples, a third sample was gathered later that consisted 
of students and instructors in Class A Navy schools, all of whom had had 
sea experience. ‘The correlations were calculated independently for thes 
three groups to determine whether or not the conditions under which th 
questions were answered altered the extent of the relation revealed by sta- 
tistical analysis. The correlation was also computed for the total grow 
(N = 1410). In all instances, the distributions of scores on the Cornel! 


Index and the motion sickness questionnaire were so skewed to the right 
(piling up of low scores) that a product moment correlation was unsound 
and the coefficient of contingency was again used to determine the extent 
of the relation. 


Validity of the Questionnaire 


Reliability of the Criterion. Since the principal criterion used to vali- 
date the questionnaire was the combined judgment of two officers or pett) 
officers who knew the individual being judged for susceptibility, the re 
liability of the criterion is of considerable importance. The two sets of 
judgments were found to be rather highly correlated; the coefficient o! 
contingency being 0.72 (N = 148). In order to check this estimate of 
correlation, tetrachoric r’s were determined for the same data by dividing 
the correlation diagram at three different pairs of axes which yielded 
ret’ of 0.68, 0.68 and 0.70. The reliability of the criterion based on the 
combined ratings would therefore be in the range of 0.81 to 0.84 when 
corrected by the Spearman-Brown formula, a reliability which is sufl- 
ciently high for the present purpose. 

Self-judgments versus External or Combined Rater’s Judgments. When 
the judgments of susceptibility to seasickness made by the individuals 
themselves were correlated with the combined external ratings or ]u¢g 
ments made by their shipmates, the coefficient of contingency was 0.71; 





in. boot 
Scores 
n addi- 
nsisted 
ad had 
r these 
ich the 
by Sta- 
| grou 
Cornell 
1e right 
nsound 


extent 


to vali- 
yr petty 
the re- 
. sets of 
cient of 
mate of 
lividing 
yielded 
| on the 
34 when 


is suff- 


When 
ividuals 
or judg- 
as 0.71; 


Susceptibility to Seasickness 291 


‘ndieating a significant relation between them. Inspection of the data 
vealed that the self-judgments tended to indicate slightly higher sus- 
ceptibility than the external judgments. The correlation plot used in the 
calculation of the contingency (Table 1) shows the skewing characteristic 
{ all the rating scale data. 
Table 1 
rrelation Diagram Showing, in Destroyer Escort. Personnel, the Relation between 
Self-judgments of Susceptibility to Seasickness and the Average 


of the Judgments by Two Other Persons 


Self-juc 


igment 
Some- Frequently 
Judgment Never Rarely times or Always 

y Others Seasick Seasick Seasick Seasick Totals 





49 14 


Seasick 
sometimes 
Seasick 
Frequently 
or Always 
12 


40 26 22 





Correlation of Criteria with Questionnaire scores. The questionnaire 
scores were more highly correlated with the self-judgments than with the 
external judgments, the coefficients of contingency being 0.63 and 0.43, 
respectively (N = 150). It is speculative which of these correlations is 
more representative of the true relation. Whether the higher coefficient 
is to be interpreted as resulting primarily from a “‘contaminated”’ criterion 
since the self-rating followed the answering of. the Questionnaire, or 
as resulting from the common introspective basis of both the questionnaire 
core and the self-judgment is a question which cannot be answered by 
these data. It follows, nevertheless, that a question concerning seasick- 
ness itself, if included in the questionnaire, would probably add to its 
validity. Such a question was the first item in an early form of the ques- 
tionnaire (2), but it was separated from the others in these studies in 
order to test the importance of the other motion-sickness history of sea- 
sickness. 

Predictive Efficiency of the Questionnaire. From the data of the de- 
stroyer escort crew it is possible to estimate the effect of using the motion 





292 James E. Birren and M. Bruce Fisher 


sickness questionnaire as a screening procedure to reduce the proportion 
of persons highly susceptible to seasickness in groups of naval personne! 
The two criteria, independent and self-judgments, were each used as 
basis for this estimate, and the two extreme categories of the rating scale 
of susceptibility—“‘frequently” or “always” sick—were combined to defy, 
the highly susceptible group in each case. The cost in personnel ¢lim)- 
nated and the composition of the group retained varies considerably wit} 
the questionnaire score chosen as the cut-off point. This is true w het her 
self-judgment or judgment by others is used as the criterion. <A cut-of 
score of 13 on the questionnaire could be expected to eliminate 15 men for 
every 100 men who were accepted. The accepted hundred would include 
about 6 highly susceptible individuals instead of the 10 to 15 per cent wh; 
were in the original unselected group; of the 15 screened out about hal 
would be nonsusceptible and half susceptible. The proportion of suscept- 
ible individuals in the accepted group would decrease as the cut-off score 


Table 2 


Item Analysis of Motion Sickness Questionnaires* 
Note: Number of cases = 150. 








Criterion 

; Independent 
Self-judgment Judgment 

Question Ttet Ttet 








. Airplanes 36 18 
. Automobiles 65 36 
. Trains 46 22 
. Street cars 54 .20 
Subways 51 13 
Busses 42 ; 
. Elevators 56 

. Swings .62 

. Hammocks iis 

. Merry-go-rounds 

. Ferris wheels AT 

. Roller coasters 61 

. Somersaults - 44 

. Ring and bar gymnastics 57 

. Ice skating _ 

. Roller skating cig 

. Dancing = 


CHONOAP Wh 





* Items asked the extent to which the vehicle or motion tended to make the indi- 
vidual sick, e.g., produced headache, or nausea, or vomiting. 
** Too few positive answers to permit calculation. 





Susceptibility to Seasickness 293 


was lowered but at an increasing cost in the number eliminated. The 
choice of a cut-off score would ultimately depend upon the number of men 
available and the proportion which could be assigned to other duties than 
those in which seasickness can be a practical issue. These findings are in 
seneral agreement with the Wesleyan University studies on the use of a 
questionnaire in predicting proneness to symptoms on a vertical acceler- 
ator (1). 

Validity of Questionnaire Items. An analysis by the tetrachoric cor- 
relation technic was made of the validity of each item in the questionnaire, 
using both the average ratings by others and the self-judgments as criteria 

Table 2). In harmony with the validity coefficients of the total ques- 
tionnaire score noted above, the item analysis showed consistently closer 
relationship between the items and the self-judgment criterion than be- 
tween the items and the external judgment criterion. 


Reliability 


Signed vs. Unsigned Questionnaires. As stated in the procedure, the 
two recruit groups were tested to discover the effect on group scores of 
signing or not signing the questionnaire, on the assumption that any 
major differences between the scores of these two large samples would 


reflect the proneness of the men to distort their answers when the results 
of the questionnaire might affect their future. The two distributions 
Table 3) appear to be similar; both medians fall in the zero interval and 
both third quartiles lie in the 4th interval. What differences do occur 
between the two distributions are small and in the direction of higher 
scores for the signed group. Although a chi-square test of the goodness 
of fit of one distribution to the other indicates that they are independent 
at the 1 per cent level of probability, the differences contributing to chi- 
square are distributed irregularly. The conclusion is still drawn, there- 
fore, that the influence of the situation on the questionnaire scores was 
minimal, ‘This finding indicates that the questionnaire technic is a satis- 
factorily reliable procedure insofar as the possible bias of signed or anon- 
ymous report is concerned. 

The split-half reliability of the quetionnaire scores was estimated 
from the coefficient of contingency between half-scores of the signed group 
(N = 544). Three different combinations of class intervals, each making 
up a4 X 4 table, were used to make some evaluation of the importance 
of the possible error involved in grouping the small numbers of extreme 
scores. The coefficients obtained, 0.82, 0.84 and 0.89, when corrected 
for test length, all indicate that the total questionnaire score has a relia- 
bility coefficient above 0.90. 


mr CTs OT 





James E. Birren and M. Bruce Fisher 


Table 3 


Distributions of Motion Sickness Questionnaire Scores Collected under Two 
Different Conditions, Signed and Unsigned 





Signed (N = 544) Unsigned (N 





or WD © 


—SOMmNS 
ano 


99.6 
99.8 
100.0 


6 
7 
0 
4 
3 
5 
2 
3 
1 
0 
1 
0 
0 
4 
0 
1 
0 
0 


a 
~— 


99.8 
100.0 


ocoooocooooorwrwwoooounFr HK Wee ow 





Psychosomatic Relations 


Both recruit groups who answered the motion sickness questionnaire 
also answered the Cornell Selectee Index questions at the same time. s 
in the case of the motion sickness questionnaire, signing or not signing 
did not make a significant difference in the distribution of scores on this 
index. Both the difference between the two means and a chi-squart 





Susceptibility to Seasickness 295 


test of the goodness of fit of one distribution to the other had P values 
sreater than 30 per cent. 

In addition to the two groups just mentioned, a group of class A 
<-hool students and instructors with sea duty answered both the question- 
naire and the Cornell Index questions. A coefficient of contingency be- 
tween the two sets of scores in the three groups combined (N = 1410) 

as 0.43. The coefficients for the individual group were as follows: 
Group A, recruits, unsigned , .. 0.44 N = 459 
Group B, recruits, signed........ N = 544 
Group C, students and instructors 0.31 N = 407 

The consistency of the size of the coefficients in these large groups 
furnishes evidence that there is a positive but low order relation between 
susceptibility to motion sickness and frequency of psychosomatic com- 
plaints. 

Further corroboration of the position that no large share of the ex- 
planation of seasickness should rest on the personality structure of the 
individual is contained in a discussion of the findings on 48 persons who 
vere diagnosed as chronically seasick (4). 


Discussion 

The questionnaire method of predicting susceptibility to seasickness 
has been shown to be of considerable potential usefulness in screening men 
for sea duty. Since it has been frequently observed that the smaller 
ships and boats cause more seasickness than larger ships, it would appear 
wasteful to train a man for a highly skilled job aboard a smaller vessel 
only to find later that he was undesirable because of seasickness. Other 
things being equal, it dees not appear efficient to assign men with lengthy 
uistories of motion sickness to duty aboard small ships. 

A history of motion sickness may be gathered by interviewers, but 
uly at a great expense of time and with a lack of uniformity in ratings. 
For large-scale selection procedures or screening, the questionnaire 
method gives promise of greater efficiency. 

In the presentation of the results on reliability it was pointed out that 
the score distributions of the signed and unsigned questionnaires were not 
In the group who signed the questionnaire, and 
ght it might affect their future, some individuals may have reduced 
their history so as not to affect their chances of getting sea duty. Others 
naire 


As sea duty. 


If these two opposing tendencies had been important, an excess 
: oi high and low scores over those found in the unsigned group would be 
1 this expected. The fact that such an effect was not noted is additional evi- 


ming 


juare dence of the reliability of using the questionnaire under these conditions. 





296 James E. Birren and M. Bruce Fisher 


Some differences did occur, however, in the willingness to indicate past 
history of motion sickness among the groups studied. The men on thp 
destroyer escort tended to indicate more history of motion sickness thay, 
the recruits. This may have been due to an increased willingness to dis. 
cuss their own symptoms of motion sickness after seeing seasicknes 
among their shipmates. The recruits, by comparison, are perhaps more 
conservative in their answers because of uncertainty. 

The analysis of the questionnaire and the Cornell Index on both the 
general Navy population and on the group of very susceptible men ip- 
dicates that the frequency of psychosomatic complaints is not closely 
related to susceptibility to seasickness. This finding has an important 
bearing on the attitude that personnel may develop toward sea or air 
sickness. It does not justify the view that excessive susceptibility to 
motion sickness can be regarded as a psychiatric disability of the same 
sort as a psychoneurosis. 


Summary and Conclusions 


1. A questionnaire for use in predicting susceptibility to seasickness, 
shown by a preliminary study to have promise as a practical procedure, 
was given to three groups of naval personnel to determine its potential 
usefulness. The questionnaire required the individual to indicate how 
often and how severely he became motion sick on certain vehicles and 
devices, excluding ships and boats. 

2. The validity of the questionnaire was studied by comparing the 
questionnaire scores of 150 officers and men with sea experience on a 
destroyer escort with the opinions of two officers or petty officers who 
were in a position to know the relative susceptibility to seasickness of 
each man. 

3. In general, a person ranked himself as somewhat more susceptible 
to seasickness than did his associates. There was close agreement among 
the observers on the severe cases of seasickness, about 10 per cent of the 
150 men being indicated as highly susceptible. An individual’s ques- 
tionnaire score correlated more highly with his own estimate of suscep- 
tibility to seasickness than with other persons’ estimates of his suscep- 
tibility. : 

4. The 10 per cent of the destroyer escort crew having the highest 
questionnaire scores included about half of those who were rated highly 
susceptible to seasickness. Elimination of this 10 per cent would there- 
fore have reduced the proportion of susceptibles from 10 in 100 to about 
5 in 100. Greater predictive efficiency for the questionnaire was in- 
dicated when self-judgments, rather than judgments by associates, were 
used as the criterion. 





ame 


ness, 
lure, 
ntial 
how 
and 


r the 
on a 
who 


a nt 
SS Ol 


)tible 
mong 
f the 
ques- 
scep- 


scep- 


ghest 
ighly 
+here- 
about 
a3 in- 
were 


Susceptibility to Seasickness 297 


5. The reliability of the questionnaire was determined by administer- 
ing it as part of the processing procedures in a boot camp. One group of 
men (N = 544) signed and answered the questionnaire without being 
told that it was not part of the routine examining procedure. A second 
croup (N = 459), was given the questionnaire, but these men were told 
that it was for research purposes only and that no names were required. 
The conclusion was drawn that the questionnaire scores were not signifi- 
cantly different for the two types of adminstration. 

6. The questionnaire is believed to have sufficient validity and re- 
liability to be of value in screening out persons who would be severely 
affected by seasickness. 

7. There is no clear evidence that personality disorders are character- 
istically associated with susceptibility to seasickness. 


Received May 13, 1946. 


References 


Alexander, 8. J., Cotzen, M., Hill, C. J., Jr., Ricciuti, A., and Wendt, G. R. Wes- 
leyan University Studies of motion sickness: VI. Prediction of sickness on a ver- 
tical accelerator by means of a motion sickness history questionnaire. J. Psy- 
chol., 1945, 20, 25-30. 

Birren, J. E., Fisher, M. B., and Stormont, R. T. Evaluation of a motion sickness 
questionnaire in predicting susceptibility to seasickness. U.S. Nav. Med. Bull., 
1945, 45, 629-634. 

‘irren, J. E., Stormont, R. T., and Pfeiffer, C. C. Reactions to neostigmine and 
apomorphine as indication of susceptibility to seasickness. Research Project 
X-278, Report No. 4, Naval Medical Research Institute, 26 Feb. 1945. 

{. Birren, J. E. and Morales, M. F. Observations on men highly susceptible to sea- 
sickness with remarks on periodic motion of ships. Research Project X-278, 
Report No. 5, Naval Medical Research Institute, 17 Jul. 1945. 

Weider, A., Mittelman, B., Wechsler, D., and Wolf, H. G. The Cornell Selectee 
Index: A method for quick testing of selectees for the armed forces. J. Amer. 
med. Ass., 1944, 124, 224-228. 





A Study of the Promotion of Enlisted Men in the Army * 


Vernon Fox 


State Prison of Southern Michigan, Jackson, Michigan 


Intelligence, education, maturity, work experience, and other special 
abilities have long been considered factors valuable to the individual jy 
dealing with his environment. In the army, where every enlisted man 
begins at the same level, the opportunity to observe and measure thes: 
abilities is unique. Such observation, though, frequently leads one { 
wonder just what sort of abilities or characteristics assist in getting along 
in the army. 

The problem first came to mind in serious proportions when, in coun- 
seling soldiers about to be discharged to civilian life, the writer had th: 
opportunity to talk with hundreds of veterans of varying abilities, back- 
grounds, and army ranks. For instance, there was the man with a 
master’s degree from Ohio State University who, after fifty-six months 
in the army four years of which was spent as a rifleman in the South- 
west Pacific, had risen to the rank of private first class. Then too, there 
was the staff sergeant of low borderline intelligence, who had require‘ 
nine years to complete six grades in grammar school, and was classified 
in the army as a psychiatric social worker. These and similar apparent 
inconsistencies gave rise to grave doubts and deep concern that stimulate 
investigation. 

The first, and to date the best, objective study of the promotion o/ 
enlisted men in the army was reported in an article by Havighurst and 
Russell.!. They studied 163 cases from a ‘‘Midwest” town within 20) 
miles of Chicago, a rural county seat of 6000 population. They re- 
ported a “very high positive relation between rank in the armed services 
and educational attainment prior to entering the services.” Further, 
they felt that they had “shown that rank in the service is highly related to 
social status in the city of Midwest,’ and suggest the hypothesis “that 
social status may be the basic determining factor in promotion in the 
services.” The inclusion of commissioned grades in Havighurst and 
Russell’s study of promotion in the army introduces a selective educa- 

* The opinions expressed in this article are the author’s, and do not necessamly 
reflect the official attitude of the Army of the United States. 


1 Havighurst, Robert J. and Russell, Mary. Promotion in the Armed Services 10 
relation to school attainment and social status. Sch. Rev., 1945, 53, 202-211. 


298 





back- 
ith a 
onths 
outh- 


there 


uire 
ssifie l 


arent 


ulate 


ion of 
st and 
n 200 
ay re- 
rvices 


irther, 


st and 
educa- 


essary 


vices In 


Promotion of Enlisted Men in Army 299 


‘ional factor, since requirements for admission to Officer’s Candidate 
Schools or for direct commissioning most frequently include at least 
sraduation from high school. Social status, it is agreed, influences the 
attainment of commissioned rank, and influences to some extent the level 
f education to which an individual will aspire; consequently, it is felt 
that Havighurst and Russell considered as a conclusion a factor they in- 
‘roduced in setting up their study. It was further reported that ‘“‘within 
a single social status group, the members ranked according to educa- 
tional level.’”’ Nothing was reported concerning the ranking according 
to social status within an educational group. It is felt that Havighurst 
and Russell generalized a little too freely, and without statistical proof. 
Since their article was but a part of a larger study, however, it may be 
hoped that their conclusions will be better supported when their main 
work appears. 


Procedure 


In the counseling branch of an army separation center, each soldier 
who is about to be discharged is counseled before his release to civilian 
life. Asan army counselor in the Separation Center at Camp Atterbury, 
Indiana, the investigator was enabled to collect objective data from the 
personnel records of the enlisted men he counseled. The following data 
were tabulated on 500 consecutive soldiers, or civilians-to-be, counseled 
in October and November, 1945: 

. Rank 
2. Army General Classification Test score 
tadio Aptitude Test score 
. Mechanical Aptitude Test score 
. Clerical Aptitude Test score 
}. Grade completed in school prior to entry into the army 
Main occupation in civilian life 
. Years spent in main civilian occupation 
. Main military occupation 
. Total number of military specialties or occupations 
. Age upon entry into the army 
Length of time spent in the army 
Whether he was a draftee or a voluntary enlistee 
. Branch of service in the army 


The sample was made up largely from the “fighting” units of the 
army. ‘The men who were separated from the army during October and 
November were those with long periods of service and combat experience, 
consequently in possession of a large number of points according to the 
Adjusted Service Rating system for domobilization. ‘There were vet- 
erans from every major theater of operations: the European, the Mediter- 
ranean and North African, the Middle Eastern, the China-Burma-India, 
and the Asiatic-Pacific theaters. The sample included men from all 


mir NCI Ty Ci 





300 Vernon Fox 


branches and arms of the services, including Infantry, Air Forces, Tank 
Destroyer units, Field Artillery, Coast Artillery Anti-Aircraft units 
Combat Engineers, Quartermaster, Ordnance, Chemical Warfare Service 
and other units that had been either in combat or in immediate support 
of combat units. All of the men had been in the army long enough ty 
allow conclusions to be drawn on the basis of their advancements in grade. 
the average period of service having been three years and four months 
The ‘‘point” system had “‘selected” those men who had been in the army 
long enough so that the time factor could be held rather constant. 

There were 17 first or master sergeants, 25 technical sergeants, 67 
staff sergeants or technicians third grade, 88 sergeants or technicians 
fourth grade, 131 corporals or technicians fifth grade, 140 privates first 
class, and 32 privates. Pearson coefficients of correlation were computed 
between rank and other factors. Comparisons were made between the 
ranks of the draftees with the voluntary enlistees, and between the ranks 
of men who were engaged in military duties similar to their civilian ex- 
periences and those who were engaged in military duties totally different 
from their civilian occupations. The men were interviewed during the 
counseling procedure with regard to their rank, and the manner in which 
it was earned—or not earned. 


Results 


The results seemed at first to be rather inconclusive. There appeared 
to be no relationship between rank and many of the factors upon which 
data had been collected. Clerical aptitude and radio aptitude as deter- 
mined by army tests had no bearing upon the distribution of rank among 
the group of enlisted men studied. Although it was felt that maturity 
should probably serve at least partially as a basis for leadership, a coeffi- 
cient of correlation of —0.03 indicated that no relationship could be shown 
between rank and age. Because of the selective function of the “point” 
system of discharge, there was, of course, no significant relationship 
between rank and length of time the men in the group studied spent in 
the service. Stability as measured by the length of time the men re 
mained in a civilian occupational field bore no relationship to rank in the 
army. Although none of the factors shows a relationship that can be 
considered very significant within the entire group, there are some trends 
that appear to merit discussion. 

Table 1 shows the correlations between rank and the factors con- 
sidered. All of the correlations are admittedly low. However, it is felt 
that the three highest correlations, mechanical aptitude, intelligence a 
measured by the AGCT, and grade completed in school, are high enough 
to be noteworthy. In modern mechanized war, it would seem logical 





Tank 
nits, 
vice, 
port 
th to 
rade, 
nths, 
my 


3, 67 
clans 
first 
uted 
1 the 
anks 
nh e@X- 
erent 
4 the 
yhich 


ared 
vhich 
leter- 
nong 
urity 
oefhi- 
10Wn 
pint” 
aship 
nt in 
n re- 
n the 
in be 
rends 


con- 
s felt 
ce as 
ough 
gical 


Promotion of Enlisted Men in Army 


Table 1 


Correlations between Rank and Factors in the Group 








Number Standard Correlation 
Factor of Cases Mean Deviation with Rank 





\echanical Aptitude 500 93.1 20.1 ae 
4GCT Scores 500 95.1 21.8 .28 
Grade Completed 500 9.7 2.5 .23 
Years in Army 500 3.3 1.2 14 
No. Jobs in Army 500 2.0 1.0 13 
Radio Aptitude 90.0 13.1 10 
Age in Years 25.6 4.9 — .03 
Clerical Aptitude 83 98.1 25.3 — .05 
Years in Civilian Occupation 500 4.7 3.5 — .06 





that those men who were best able to manipulate machines of war would 
tend to gain higher rank, so it is not surprising to find that mechanical 
aptitude should show the highest correlations with rank in this group of 
erseas veterans. 

The influence of civilian skills on promotion in the army is shown in 
Table 2. It is readily apparent that the men who used civilian skills in 
the army tended to be retained in the lower grades, whereas advancements 
went to those who performed military work totally unrelated to their 
civilian skills or experiences. 

Men doing military work totally unlike their civilian skills held ranks 
significantly higher than men doing work either identical with or related 
to their civilian skills, as shown by critical ratios with the first distribution 
of 40.5 and 6.8, respectively. This condition is undoubtedly the result of 
the T/O, or Tables of Organization, system used by the army, in which 


Table 2 


Ranks of Men Using Civilian Skills Compared with Men Not Using C 


ivilian Skills 








Men Doing Men Doing Men Doing 
Military Work Military Work Military Work 

Enlisted Identical with Related to Unrelated to 
Grade Civilian Skills Civilian Skills Civilian Skills Total 





Ist 1 16 17 
2nd 1 24 25 
16 50 67 
37 34 88 
46 41 131 
57 37 140 
14 13 32 


Totals 500 








Vernon Fox 


rank of each specialist is designated, and prevents advancement beyoy 
a level designated for each particular job. The men who used their ej 
ian skills obtained their military jobs comparatively early in their gory. 
ice, and then found their ranks “frozen” by the T/O. 

Of the group of 500 men, 108 had enlisted voluntarily, and 392 }ya\ 
been drafted. Actually, only 27 of the 108 volunteers had enlisted free) 
and without regard to imminent conscription. The ranks achieved }) 
draftees as compared with those achieved by volunteers are shown | 
Table 3. The volunteers hold significantly higher ranks than do th 


Table 3 
Ranks Achieved by Draftees Compared with Ranks Achieved by Volunteers 





Enlisted 
Grade Draftees Volunteers Total 











Ist 7 10 
2nd 10 
3rd 41 26 
4th 18 
5th 20 
6th 19 
7th 27 5 


Totals 392 108 500 





draftees, as shown by a critical ratio of 5.0 between the two distri- 
butions. In view of this significant difference, a comparison was made 
between the draftees and volunteers on the basis of the other factors 
considered in this study, and was tabulated in Table 4. A consistent 


Table 4 


Comparison of Draftee and Volunteer Groups with Regard to Factors Considered 





392 Draftees 108 Volunteers 
Correlation Correlation Critical 
Factor Mean with Rank Mean with Rank Ratios 








Mechanical Aptitude ; 93.5 .30 91.7 
AGCT Scores 95.7 22 93.8 
Grade Completed 10.1 .20 8.4 
Years in Army 3.3 13 3.2 
No. Jobs in Army 1.9 12 2.2 
Radio Aptitude 90.1 .09 89.7 
Age in Years 25.9 — .03 24.4 
Clerical Aptitude 99.5 — .05 96.3 
Years in Civilian Occupation 4.8 —.05 3.4 








Promotion of Enlisted Men in Army 303 


superiority of the draftee group over the volunteer group in all factors is 
shown, the differences ranging from insignificance to very significant. 
The consistency of the trend is noteworthy when compared with the data 
‘n Table 3, which indicates that it is the volunteers even in length of 
service, Which may have resulted from the pre-war peace-time draft. 
fhe most significant differences are in chronological maturity and in 
whatever stability may be indicated by length of experience in the man’s 
ivilian occupation. Within the group of volunteers, it is interesting to 
fnd that the correlations between factors and rank are much higher than 
‘n the group as a whole, and that the correlations between rank and these 
factors in the draftee group become lower. This may result from the 
et that the volunteer group is somewhat less select a group than the 
iftee group, or may indicate differences between these groups in their 
motivation, morale, attitudes toward the military, and willingness to 

operate with military order. 
The interviews with the 500 men with regard to rank were not par- 
arly productive in the search for sound reasons for differentiated 
promotions. The majority of men reacted to questioning perceptibly, 
however, and on an emotionalized basis. Apparently it was an issue of the 
tmost importance to the men. There were 129 men who gave no ade- 
late reply to the questioning as to why they had or had not been pro- 
The main reasons given by the remainder of the group are listed 

In Table 5. 

It is apparent that by far the most important single factor considered 
fluential in promotion and non-promotion alike is the Table of Organ- 
ization system. With reference to the data in Table 5 it is recognized 
hat for many of the “‘non-promoted” group, the reasons given were forms 
of rationalization and compensation. Personal relationships between the 
officers in command of a unit and the men within the unit, however, must 


surely have a bearing on the selection of men to be promoted to non- 
commissioned ranks. 


listri- 
made 
ictors 


istent 


Conclusions 


In summary, it is noted that there is a slight tendency for intelli- 
gence, mechanical aptitude, and educational background to influence 
the awarding of rank among enlisted men in the army. Men who per- 
formed military jobs similar to their civilian work, however, were not 
irequently promoted. Enlistees tended to hold higher rank than did 
the draftees. No significant relationship could be shown between rank 
and such factors as age, stability on military or civilian jobs, number of 
military specialties, main civilian occupation, nor radio and clerical apti- 
tudes. Length of service in the army was held constant by the selective 
‘unetion of the “point”? system of demobilization. It should be noted 





Vernon Fox 


Table 5 


Main Reasons Given for Promotion or Non-Promotion 











Reasons for Promotion: 
. Table of Organization provided for promotion of someone 
. Had a loud voice, easily heard by groups of men.............. 
. Had a “good personality’ —got along well with others 
. Knew basic military drill when a new unit was activated 
. “Merit”’ (unable to explain further) 
. In an overseas theater, became friendly with commanding office "er — or 
his lieutenants at a drinking party 
. In an overseas theater, kept commanding officer and/or his lieutenants 
supplied with native liquor and contacts with more receptive elements 
of feminine population 
Reasons for Non-Promotion: 
1. Table of Organization prevented promotion, or was “clogged”’ 
2. Personal likes and dislikes on basis of personality 
3. Refusal to furnish native liquor or women to the officers and non-com- 
missioned officers in the unit overseas 
4, Army’s failure to recognize and reward education, civilian leadership 
experience, and other qualifications 


chin hnsieadeche wake teboadeeteneversedacetcdiadoas 37 





particularly that within the group of volunteers, there was a greater cor- 
relation between rank and such factors as educational background, intel- 
ligence, and mechanical aptitude. 

From the psychological standpoint, it is possible that the volunteers’ 
motivation, or rather, their more favorable attitudes toward things 
military, stood them in better stead for advancement in army enlisted 
grades. It is possible that their motivation and more favorable attitudes 
toward military functions were considered to be of greater value to the 
army than the relatively superior intelligence, education, civilian work 
experience, and other abilities and qualifications of the draftees. This 
suggests corroboration of the hypothesis so frequently proposed by reg- 
ular army officers during orientation lectures in the service that the 
volunteer makes the hetter soldier because he wanis to be in the army. 
The motivation for the performance of jobs in the army has too often been 
the threat of punishment if it isn’t performed, rather than any reward 
from a job well done. A man who is a member of a military organization 
against his will is less likely to favor that sort of motivation. The volun- 
teer, or enlistee, with his more favorable attitude toward things military in 
the first place, seems better able to subject his own desires, plans, aad 
ambitions to the will of his superiors. He may even like to have his living 





cor- 


intel- 


feers’ 
hings 
listed 
tudes 


o the 


work 
This 


> reg. 
t the 
my. 
| been 
ward 
ation 
‘olun- 
ary in 
, and 
living 


Promotion of Enlisted Men in Army 305 


habits prescribed. The loss of his individuality to identification with the 
group may seem pleasurable, for it relieves him of individual responsi- 
bility. The ability to accept responsibility is a criterion of social matur- 
ity, and the fact that rank has a negative correlation with age, slight as 
+ is, does not suggest that the chronologically mature man advances 
faster in the army than the chronologically immature one. Perhaps the 
same situation obtains with regard to social maturity. 

It is readily apparent that there is no well-organized and planned 
method of promoting men to enlisted gradesin the army. Promotions ap- 
pear to be a matter of expediency. The army is apparently not interested 
in obtaining the best man for the job, but rather, some man who can do the 
job now, almost without regard for the quality of work to be done. Re- 
cords of men are not generally consulted for promotional purposes. The 
old, familiar, enlisted man’s interpretation of gaining promotions as 
being “in the right place at the right time’”’ appears to have some merit. 
At any rate, however, the man to be promoted must have accepted the 
army system of regimentation at least to the extent that he does not fight 
against 1t. 


Received June 24, 1946. 





The Development of a Method of Appraisal for 
Assistant Managers * 


Leonard W. Ferguson 


Research Section, Field Training Division, Metropolitan Life Insurance Company, 
New York City 


The experiment described in this paper! was undertaken in order { 
provide an objective, reliable and valid method for appraising “on th 
job” performance of 2000 assistant managers in the Metropolitan Lif, 
Insurance Company. These assistant managers take part in the manage- 
ment of more than 800 district offices through which the Metropolitan 
Life Insurance Company meets and services the needs of its present and 
prospective policyholders. The appraisals secured are to serve as one | 
the principal bases for the development of more adequate methods o/ 
selecting, compensating, motivating and promoting these assistant 
managers. 

After a thorough job analysis undertaken by the writer’s predecessors, 
Dr. John M. Willits and Mr. Eugene Asckenasy, the first step taken was 
to collect 622 statements pertaining to various aspects of an assistant 
manager’s performance. These statements were suggested by managers, 
assistant managers, agents and other representatives of the Metropol- 
itan’s field organization, by a review of textbooks and appraisal forms 
used in other companies; by exploratory research undertaken by Dr 
Willits and Mr. Asckenasy, and by the previous research and experienc: 
of the writer. After proper editing, the statements were divided into 
two separate groups of 311 statements each; following which, 50 agents, 
50 assistant managers, 50 managers and 50 other representatives of the 
Metropolitan’s field organization were asked to evaluate them on a nin 
step equal-appearing-interval scale.2, With the exception of the 50 rep- 

* This article is a “prior publication,” the author paying complete costs. Th: 
scheduled 80 pages per issue is thereby increased by the corresponding amount, thus 
the “early publication” of -this article is a direct contribution to the subscribers of tl 
Journal of Applied Psychology without handicap to those authors whose articles 2! 
accepted and printed in their regular turn. 

1 This paper under the title “The development of an adequate method of appraisal 
was presented at the Philadelphia meetings of the American Psychological Associatio! 
Sept. 4-7, 1946, an abstract appearing in The American Psychologist, 1946, 1, 279. ; 

2 The reader will recognize that this approach resembles the Richardson-ku ler 
adaptation of the Tlurstone attitude scaling technique to the construction o! meri 
rating scales. See Richardson, M. W. and Kuder, G. F., Making a rating scale that 


measures. Person. J., 1933, 12, 36-40. 
306 





Method of Appraisal for Assistant Managers 307 


resentatives last mentioned, who were asked to evaluate both groups of 
statements, the Judges who evaluated one set of statements were distinct 
from those who evaluated the other set of statements. All ratings were 
secured as a result of personal visitation and consultation with each of the 
individuals who was asked to furnish a set of ratings. The gist of the 
lirections supplied was to the effect that the rater was to read each state- 
ment, he was to assume that it applied to.some one assistant manager, 
and he was then to indicate which one of nine possible degrees of pro- 
ficiency was denoted by each of the statements in question. 

\ppropriate analysis of the results shows no practical difference what- 
ever between the responses of the various groups of judges, but it does 
reveal some very significant differences between each of the obtained 
distributions and those which, theoretically, might be expected. In 
contrast to expectation, ratings of 1, 2 and 3 (that is, highly unfavorable 
ratings) were assigned too frequently; while ratings of 4, 5 and 6 (that 
is, average and near average ratings) were assigned too infrequently. 
Ratings of 7, 8 and 9 (that is, highly favorable ratings) were assigned in 
close agreement with expectation. All estimates as to the proficiency 
level represented by each statement were translated into standard rat- 
ings, and from these standard ratings, the mean proficiency level that 
each statement was considered to represent was determined. These rat- 
ings were then used, in a manner later to be described, as one of the bases 
for selecting the statements to be included in the completed appraisal 
forms. 

Criterion scores for this study were established upon the basis of 
various ratings assigned to assistant managers by 104 Metropolitan field 
representatives whose duties bring them into close personal contact with 
assistant managers. Three types of rating, secured via the conference 
approach and under the writer’s immediate supervision, were obtained. 
These are: (a) a degree of acquaintance rating, (b) a numerical perform- 
ance rating, and (c) a paired comparisons rating. 

Kach of the field representatives in question was first asked to in- 


dicate whether he knew each of the assistant managers whose name ap- 
peared on a previously prepared list: (a) extremely well; (b) moderately 
well; (c) slightly; or (d) not at all. All ratings were to reflect the degree 
and extent of actual observation of the assistant manager’s ‘‘on the job”’ 
performance. Excluding the data for the category “not known at all” 


T 


more than 3000 ratings for almost 1600 assistant managers were obtained. 


Twenty-seven per cent of these ratings fell into the category “extremely 
well acquainted,” 36 per cent of them fell into the category ‘moderately 
well acquainted,” and 37 per cent of them fell into the category “only 
slightly aequainted.”’ 





308 Leonard W. Ferguson 


The second step in obtaining criterion information was that of se. 
curing overall numerical performance ratings on a nine step rating scale. 
Each field representative was asked to supply a rating for each assistant 
manager whose performance he claimed to know. In explaining the 
directions for this rating, considerable emphasis was placed upon the fact 
that a rating of performance rather than a rating of capability was desired 
Furthermore, it was suggested that the ratings for a large and repre. 
sentative group of assistant managers should be distributed in accordance 
with the percentages predicted for a normal distribution. As a resy|t 
of these directions, more than 2500 ratings, distributed in the expected 
manner, for more than 1400 assistant managers were secured. 

The third step in securing criterion data was to request field repre- 
sentatives to judge the performance of some of the better known assistant 
managers in accordance with the paired comparisons method of rating. 
Therefore, while the field representatives were engaged in the process of 
assigning the numerical performance ratings just discussed, the writer 
determined the number of assistant managers with whom each field 
representatives was: (1) only slightly; (2) moderately well; or (3) ex- 
tremely well acquainted, and then listed the names of the better known as- 
sistant managers along the side and top of a checkerboard work sheet to 
facilitate the paired comparisons ratings. Each of these lists had to be 
prepared in a limited amount of time during a conference session, so it is 
quite likely that less care was exercised in their preparation than would 
otherwise have been possible and desirable. Nevertheless, the 1100 assis- 
tant managers whose names were listed are known significantly better 
than the total of all 1600 assistant managers for whom acquaintance 
ratings were obtained. 

At the outset of this study it was the writer’s intention to use as crite- 
erion scores only those ratings secured via the paired comparisons tech- 
nique. The numerical performance ratings were merely to be used as 
a commonsense check upon the many adjustments that were to be re- 
quired. However, when a very marked degree of correlation varying, 
in different samples, from .69 to .92 and having an average value of .81 
between the paired comparisons performance ratings and numerical per- 
formance ratings was discovered, it appeared that it would be better to 
make use of composite averages as the criterion scores. Therefore, after 
adjustments to equate the ratings from several geographically separate 
administrative units were effected, criterion scores were determined by 
averaging together the numerical performance ratings and the paired com- 
parisons performance ratings. . 

From a complete summary of the number and type of all criterion 
ratings secured; the number of field representatives who furnished these 





to 


se- 
‘ale, 
Lant 
the 
fact 
red 


pre- 
tant 
ting. 
SS of 
riter 
field 
) @X- 
N as- 
et to 
O be 
it is 
‘ould 
1SSIS- 
etter 
ance 


erite- 
tech- 
ad as 
ye Te- 
ying, 
f 81 
| per- 
er to 
after 
arate 
od by 


com- 


erion 
these 


Method of Appraisal for Assistant Managers 309 


ratings; and the number of assistant managers rated thereby, it is esti- 
mated that the reliability of the various sets of numerical performance 


of the paired comparisons performance ratings varies from .72 to .94. 
By taking account of all types and combinations of ratings (with due 
regard for their respective number), it is estimated that the overall re- 
liability of the entire series of criterion scores is slightly more than .75. 

In the course of this study three sets of appraisal forms were prepared. 
The statements included in the first set of forms were selected upon the 
basis of the equal-appearing-interval values previously discussed in such 
a way: (1) that all intervals of performance were, as nearly as possible, 
equally well represented; and such (2) that the mean variability of the as- 
signed scale values, and consequently the degree of ambiguity of each 
statement, was a minimum. ‘The two sets of 142 statements which met 
these requirements were arranged in random order and in such fashion 
that each statement could be said to be “‘true”’ or “‘false’’ with respect to 
any assistant manager. ‘These forms were sent to all managers in the 
company reporting directly to the New York Head Office, half of them 
receiving Form A first and Form B a month later, whereas the other half 
received Form B first and Form A a month later. Both forms had official 
endorsement but it was understood by each manager that the results were 
to be used only in research. 

Completed forms were returned for over 1770 assistant managers. 
Perusal of the answers led to the conclusion that statements indicating de- 
sirable qualities were said to be “‘true”’ much too frequently; while state- 
ments indicating undesirable qualities were said to be ‘‘true’’ much too 
infrequently. As a result, these appraisals did not prove to be as mean- 
ingful as it was hoped they might be. Nevertheless, it proved possible to 
make use of the results since statements said to be true of more than 90 
per cent, or less than 10 per cent, of the assistant managers were elimi- 
nated from further consideration. 

In view of these, and other results not here reported, it was felt that 
in further experimentation it would be desirable to provide: (1) a larger 
number of alternate answers from which the responses to each statement 
could be selected; (2) a more detailed set of written directions; (3) oral 
instruction; (4) a means of controlling overly generous responses; and (5) 
personal supervision over completion of the forms. The necessary steps 
to achieve each of these goals were undertaken and two new forms, each 
consisting of 85 statements known to be true of somewhere between 10 
and 90 per cent of the assistant managers were prepared. These forms 
Were completed via the conference technique under the writer’s imme- 
diate personal supervision. This necessitated the holding of 132 con- 





310 Leonard W. Ferguson 


ferences in 24 states, and took a period of 6 months to complete. . 
ach conference, 5 to 7 managers, a field representative and the writ, 
were in attendance. Directions were presented in both written and 0, 


nating all considerations other than those pertinent to appraising present 
job performance. ‘The idea of a normal distribution was discussed and 
suggestions were made to the effect that the burden of proof with respect 
to the truth of any answer was to be upon each individual manage; 
Upon the completion of each form it was reviewed with the manager, wit} 
the result that many managers revised their answers and thereafter con- a} 
tributed a more conservative set of ratings. As a result of these confer. 
ences, appraisals for more than 1740 assistant managers were secured 

In order to determine item validity, both Pearsonian and tetrachori 
correlations between the responses to each item and the criterion scores 
were ascertained (N = 1008). These correlations vary from .00 to +.52. 
and have an absolute mean value of .31. Fifty-two of the statements in 
ach form were found to correlate with the criterion to the extent of +.31. W 
or more, and for this reason were selected for inclusion in the final scale 

In order to determine the proper scoring value for each of the five 
possible responses, which are that a statement*® is: (1) always or com- 
pletely, (2) usually or almost, (3) sometimes or moderately, (4) seldom or 
slightly, or (5) never or not at all characteristic of an assistant manager, 
the differences between the responses found to be characteristic of 312 
relatively successful assistant managers (representing, upon the basis of 
criterion scores, the top 3 decile groups) and those found to be charac- 
teristic of 290 relatively unsuccessful assistant managers (representing 
the bottom 3 decile groups) were computed. These actual differences, 
except for minor adjustments necessary to assure logical continuity, wer 
accepted and used as the scoring values. 

When the two sets of 52 selected items are scored by use of the weights 
just described, it is found that the intercorrelation between the two forms 
ranges, in different samples, from .93 to .96; and that for all 1008 cases 
it is .95. The correlation between the scores on the odd and even num- 
bered statements for each form ranges, in different samples, from .87 to 
.95; while when based upon all cases it is .93. If these last values ar 
entered in the Spearman-Brown Prophecy formula, it may be estimate 
that the reliability of each form ranges, in different samples, from .99 to 
.98 and for all 1008 cases it has a value of .96. 


| 


’Sample statements are: “Helps Agent set up definite objectives; Arouses District 
spirit and loyalty; Tends to be interested only in himself; and, Lacks snap in getting 
work done.” 





Method of Appraisal for Assistant Managers 


Validity was determined by correlating the appraisal scores with the 
sviterion scores previously discussed and is found to range, in different 
_mples, from .28 to .60. When these values are corrected for attenu- 
‘ion it ranges from .32 to .69 and, when based upon all cases, the validity 
is .52 In view of the many difficulties that had to be overcome in the 
levelopmental stages of this study it is concluded that these results are 
eminently satisfactory.* 

The Metropolitan Life Insurance Company has accepted these forms® 

- official use in periodic and yearly appraisals of the performance of 


assistant managers. As stated in the introduction to this paper, these 
r CON- .ppraisals are to be used in the study and development of what is hoped 


onfer- may prove to be more adequate methods of selecting, compensating, 
motivating and promoting assistant managers. 


‘ured 


chori Received February 5, 1947. 


SCOres 


‘In eight out of eleven separate administrative units the validity is .60 or greater. 
In fact in these particular territories the validity averages around .64. 

‘These forms are not available for general distribution, but upon request to the 
riter, copies of the forms will be made available to properly qualified individuals. 





The Value of Thirteen Psychological Tests in 
Officer Candidate Screening * 


Milton B. Jensen 


Louisville, Kentucky 


and Julian B. Rotter 


Ohio State University 


From June 1943 to August 1944 the Office of the Military Psycholo- 
gist, The Armored School, Ft. Knox, Ky., was engaged in a project of 
examination and evaluation of the students in the Officer Candidate 
School. Seventeen classes, comprising 1492 officer candidates, were 
examined during this period with a battery of thirteen psychological 
tests. Some of the tests which did not appear useful were employed 
only during a portion of this period. It is the purpose of this paper to 
report the data and the impressions of the authors regarding the value 
of these tests for selection and prediction purposes. 

The tests were chosen or devised with the intent of developing tech- 
niques for selecting candidates who would successfully complete the 
course of instruction and be valuable officers. With this in mind several 
tests were included with full knowledge that they were too easy for the 
most capable officers and candidates. It was deemed more necessary 
that the tests establish critical levels for officer candidates than that they 
discriminate in the upper levels of ability. Limitations of time and 
clerical help necessitated this choice. While some tests differentiated ex- 
cellently between “failing” and “‘passing’’ candidates they were not diff- 
cult enough to supply data of the type we should have liked for portions 
of this study, particularly as regards critical ratios and intercorrelations. 
The data are presented with understanding of these limitations. 


The Test Battery 
The tests, some of- which contain more than a single measure, are 
divided, for convenience, into two categories: ability tests and person- 
ality tests. They are listed below: 


A. Ability Tests ol 
1. Personnel Test: This is a modification by Wonderlic of the Otis Self- 


Administering Tests of Mental Ability, Higher Examination (13). Adminis- 
tration time: 12 minutes. 


*The complete records upon which this study is based are on file in the Adjutant 
General’s Department, U. S. Army. . 
12 





\olo- 
t of 


date 
vere 


, are 
rson- 


Self- 
uinis- 


utant 


Officer Candidate Screening 313 


2. Stanford Achievement Test: Advanced (6): Three parts of the test were 
ised: (a) Paragraph Meaning or reading comprehension, (b) Word Meaning 

vocabulary, and (c) Arithmetic Computation. Administration times: 
a) 20 minutes; (b) 10 minutes; (c) 30 minutes. 

Test of Mechanical Comprehension, Form AA (1): The testee is pre- 
sented with drawings illustrating alternate solutions of problems based on 
elementary principles of physics. His task is to choose the correct response 
‘om two or more possible answers. Administration time: about 20 minutes. 

Army General Classification Test: This. general intelligence test con- 
structed by the Adjutant General’s Department consists of vocabulary, block 
counting and arithmetic problems. Administration time: 40 minutes. 

5. Speed of Substitution Test: A substitution test developed in the office of 
the Military Psychologist consisting of thirteen trials with a complicated 
20 symbols) key. The key for each trial is different. Administration time: 
30 minutes. 

6. Rhythm Test: A part of the Seashore Measures of Musical Talent, 1939 
evision (10). Administration time: about 25 minutes. 


B. Personality Tests 


1. P-Inventory: An emotional stability questionnaire. This is Part IV 

Psychasthenia) of the Minnesota Multiphasic Personality Schedule (4) pre- 
ate as a printed questionnaire. Administration time: about 10 minutes. 

2. C-Inventory: Devised in the office of the Military Psychologist. Con- 
sists of 38 questions concerning the physical health of the examinee with par- 
ticular reference to complaints of a psychosomatic nature. The questions 
used were reworded and simplified, for the most part, from items of standard 
inventories. Administration time: about 8 minutes. 

3. Level of Aspiration: The substitution test also was used as a method of 
measuring level of aspiration. The scoring and methods of pattern analysis 
were developed by Rotter (9). For each of the 13 trials the subject makes an 
estimate of the number of substitutions he expects to achieve. Past research 
and clinical use suggest that the level of aspiration techniques serves as an 
index of such personality traits as aggressiveness, ambition, cautiousness, secu- 
rity, stability, and their opposites. Administration time: about 30 minutes. 

4. Thematic Apperception Test (7): From five to eight pictures in the The- 
matic Apperception Test were projected on a screen in a darkened room. The 
room was illuminated after each projection and the subjects wrote short stories 
about the pictures. A maximum time of eight minutes was allowed per story. 
The method was used in the present study to arrive at an estimate of the 
emotional maturity and stability of the candidates. Administration time: 
40 to 60 minutes. 

5. Honesty Tests (Circles and Squares): These two tests, which give the 
subject an opportunity to cheat in situations where the examiner knows reasou- 
ably well whether or not he has cheated, were originated by Cady and modified 
by Terman for use in his studies of superior children (11). The basic principle 

{ these tests is that the tasks presented are impossible of achievement, as 
shown by the performance of individuals actually blind-folded, without re- 
course to cheating (opening the eyes). Administration time: about 10 minutes. 

Harrower-Erickson Multiple-Choice Rorschach Test (3): The subject 
sele “ from a choice of ten his response to each Rorschach plate as it is pro- 
jected on a screen. The test is purported to select emotionally unstable 
individuals, Administration time: about 15 minutes. 

Vocational Interest Schedule: Considered by its author, L. L. Thurstone, 


to hs a test of different areas in occupational interests (12). Administration 
time: about 15 minutes. 





Milton B. Jensen and Julian B. Rotter 


Procedure 
The evaluation of the tests used followed several different approaches n 


1. Examination of 56 officers selected as outstanding by their supp 
rior officers. These officers all volunteered to take the tests, althoug! 
some of them were not available for the entire battery. They were 4| 
from 20 to 30 years old. It was thought that the more useful tests would 
indicate some superiority of outstanding officers over unselected office; 
candidates. 


2. Prediction of academic success. Efficiency of various ability tests 


to predict academic success or failure was determined. It became a; 


parent during the course of this investigation that no consistent policy 


of eliminating candidates was in use other than for academic failur 10 
Consequently the personality battery could not be evaluated by thy 
criterion of graduation from OCS. 

3. Obtaining ratings on candidates. For one test (the Rhythm Test th 
ratings were obtained on the ten best and ten poorest candidates fron 
two classes from the point of view of ability to keep good cadence in 
marching group. 

4. Clinical evaluation of individuals. On frequent occasions candi- 
dates were referred for intensive individual examination. These ex- 
aminations gave an opportunity to judge the effectiveness of group tests 
Several officers were likewise referred for examination who had taken th: si 
battery of tests as candidates. 


Results 


During the course of the investigation several tests were dropped 
from the battery. The list of tests includes all the tests given for an) 
portion of the period during which this study was conducted. Tests 
eliminated from the battery are listed below along with reasons for elimi- 
nation. 

1. Thurstone Vocational Interest Schedule. This test was originall) 
included in the battery in the expectation that superior army office! 
might show some consistent pattern or patterns of vocational interest 
The study of our 56 outstanding officers showed this not to be the cast 
Every type of interest was represented, including individuals who showe 
no interest in any of the categories and those who showed high interest 10 
all. Insofar as the younger officer candidates showed this trend to a! 
even greater degree it was not feasible to use it as a criterion for selecting 
superior officers. 

2. Army General Classification Test. This test which the army used 
during the period of this investigation as a selective measure for eligibility 





Officer Candidate Screening 315 


+, OCS was employed for the greater part of the study and then discarded 
on the basis of the data obtained from the study of prediction of academic 
success. It was found that this instrument, which purported to test 
the same abilities as the Personnel Test, was less effective for these pur- 
noses and did not add to the value of the predictive battery. The time 
for giving and scoring this test was four or more times that for the Person- 
nel Test. 

3. The Rhythm Test. This test was added to the battery on request 
‘the Director of the Armored Officer Candidate School for a test which 
would distinguish candidates who were unable to count cadence for mil- 
tary drill. The ratings of instructors of the ten best and ten poorest 
ndidates in this regard from two classes (each with approximately 
100 students) were obtained. Group averages and bi-serial correlations 
for the 40 candidates showed no differences approaching significance. 

1. The Harrower-Erickson Multiple-Choice Rorschach Test. Use of 
this test according to the author’s norms resulted in the untenable con- 
clusion that 45% of the outstanding officers and 36% of officer candidates 
were seriously disturbed emotionally. Comparison with other person- 
ality tests and clinical evaluations likewise indicated the test to be in- 
valid for selecting officer candidates. Detailed results with this test 
have been reported separately by the authors (5). 

5. The Group Thematic Apperception Test. Although it was pos- 
sible to obtain reliable ratings of emotional stability from this test (2) it 
t was eliminated from the test battery after experimental trial with four 
classes because of excessive time in administration and the need for 
trained psychologists for interpretation. 


Reliability of the Tests 


Table 1 gives reliability coefficients for the ability tests of the final 
battery, for the most part taken from reports by test authors. Coeffi- 


Table 1 
Reliability Coefficients for Ability Tests 
Reliability 
Coefficient 


yi g 


Substitutions .93* 
Personnel Tests 91 
Paragraph Meaning 94 
Word Meaning .92 
Arithmetic Computation 91 
Mechanical Comprehension 84 





“Calculated by the Kuder-Richardson method (8) and based on 259 cases. Relia- 
buity coefficients for the other tests are taken from test manuals. 





316 Milton B. Jensen and Julian B. Rotter 


cients of .78 and .79 for 258 subjects were calculated for P- and (-Jp. 
ventories respectively by the Kuder-Richardson method. Reliability 
coefficients for the Level of Aspiration patterns and for the Honesty Tests 


were not calculated. 


Comparison of Officer Candidates and Outstanding Officers 


Comparisons were made between the scores of ‘“excellent’’ officers 


and the first three classes of officer candidates studied. 


Large grou 


differences were not expected inasmuch as many of the officer candidate; 
potentially were excellent officers and since several of the tests used wer 
not difficult enough to differentiate between the most capable men. Fx. 


pectations were confirmed by analysis of the data. 


Where significant 
] 
i 


differences between groups were found they favored the “excellent 


officers. 


Test 





Personnel 56 


Paragraph 
Meaning 31 
Word Meaning 
Arithmetic 
Computation 
Mechanical 
Comprehension 
Substitution 
P-Inventory 
C-Inventory 
Group Rorschach 


No. of Cases 


Table 2 


Comparisons between Outstanding Officers and Officer Candidates 


Cand. 


258 
258 
258 
258 


Generally, medians rather than means are employed in th 


Median 


Cand. 


28.0 


84 
88 
76 
42 


4 
3 





No. of 
Cases 


Off. Cand. 
Group Level of 
Aspiration 56 259 


% Good 
Patterns 


Cand. 


61 50 


% Ques- 
tionable 


27 34 





No. of 
Cases 


Off. Cand. 
Circles 5 259 
Squares 259 
Combined 259 


% Honest 


%, Dis- 


% Ques- 
10nes 


tionable 





Off. - Cand. 
89 86 
70 82 
79.5 84.0 


Off. Cand 


Off. Cand. 
9 9 

23 12 

16.0 10.5 








C-In. 
bility 


Tests 


ficers 
rr Uy 


dates 


Officer Candidate Screening 317 


comparisons since the median is a more meaningful measure with data 
of this type. 

Statistically significant differences were found on the Personnel Test, 
Arithmetic Computation, and Mechanical Comprehension Tests. In 
each instance the officers were superior, showing better verbal intelligence, 
arithmetic ability, and mechanical comprehension. The Paragraph and 
Word Meaning Test gave a slight, but statistically insignificant trend in 
favor of the Officers. The most likely explanation for officer superiority 
is that the officer candidates include students who later became academic 
failures or who had personality defects which precluded their becoming 
fficers, or having become officers, failed on actual assignments. 

Among the personality tests the only one to yield differences tending 
towards significance was the Level of Aspiration. The C R of the differ- 
ence between percentages of “good’’ patterns for the two groups was 
1.5. The “excellent” officer group showed more of the good patterns in- 
dicating aggressiveness and initiative (61% as against 50%), while the 
oficer candidate group contained more of the questionable, cautious pat- 
terns (34% as against 27%) and more of the poor patterns which are con- 
sidered typical of individuals who are either unstable or erratic (16% 
against 12%). 

The Rorschach Checklist, the C-Inventory and P-Inventory, Sub- 
stitutions, and Honesty Tests showed no group differences of statistical 
importance. 

Test Predictions of Academic Success in AOCS 


As an initial step in determing the value of the tests of ability in pre- 
dicting suecess in AOCS classes 53 and 54 (the first to whom all the 
tests were given) were used as criteria. The candidates in these classes 
were placed in three groups: (1) those who successfully completed the 
course; (2) those who failed the course for academic reasons as revealed 
by their course grades, regardless of official reasons given for their release 
from school; and (3) those who were relieved from school for a variety of 
reasons, such as transfer to other officer candidate schools, misconduct, 
physical disqualification, or resignation without apparent reason. Pre- 
diction of academic success from test scores involved only the groups of 
academically successful and unsuccessful candidates. 

In order to determine the degree to which a particular test predicted 
success, bi-serial correlation coefficients were calculated between the 
criterion of pass or fail officer candidate school and test score. The 
failure group consisted of all men who did not complete the course with 
an average of 70 or better or who had 3 or more failing grades on individ- 
ual tests and a failing average on all tests. These data are influenced by 
inadequate standardization of AOCS tests and by the fact that some men 





318 Milton B. Jensen and Julian B. Rotter 


passed the course with grade averages lower than others who resign, F 
were “boarded out,’’ ostensibly for academic reasons. 

Table 3 gives the number of cases and the average passing and failj; 
group scores, standard deviations of the entire group and bi-seria] 


pr 
‘( 


relations for all scores on each test in the ability battery with the criterio, 


Table 3 
Values of Ability Tests in Predicting Academic Success, AOCS 


(Preliminary Evaluation) 


N Group Mean 


119 Success 79.4 
38 Failure 64.2 


Arithmetic Computation 


Personnel Test Success 
Failure 
Army General Classification Success 
Failure 
Paragraph Meaning Suecess 
Failure 
Speed of Substitutions Success 309.1 
Failure 273.4 
Word Meaning Success 87.7 
Failure 82.8 
Mechanical Comprehension 119 Success 
38 Failure 
Total Reading (New Stanford) 119 Suecess ; 
38 Failure 59. 10.8 


* Standard deviations and correlations are for the total group of 157, composed 
119 successes and 38 failures. 


In combining test scores to secure the best possible predictive meas- 
ure of academic success with the data available, weightings were mad 
in terms of variability and Bi-Serial correlation with ‘“‘pass”’ or “‘fail,”’ em- 
ploying the following formula where X refers to the individual score, ¢ | 
the standard deviation of the distribution, and r to the correlation coefli- 
cient: 


a s' 
—_ X X3 Xs 
Composite Score = — —fee°°* + rs 
03 08 
Table 4 gives the Bi-Serial correlations for various combinations 0! 


tests with academic success in AOCS. The best combination for pre- 





Officer Candidate Screening 319 


lictive purposes is composed of Total Reading, Personnel and Arithmetic 
(Computation scores. These tests alone, when properly weighted, pre- 


dicted academic success better than when other tests were combined with 


them. 
lable 4 
Serial Correlations of Statistically Weighted Test Scores on Various Combinations 
of Tests with Academic Success, AOCS 
(N = 157) 


Bi-Serial 
Test Combination r 


Arithmetic Computation and Personnel 
\rithmetic Computation, Personnel, and Total Reading 
Arithmetic Computation, Personnel, Total Reading, and Army 
General Classification 89 
\rithmetic Computation, Personnel, Total Reading, and Speed of 
Substitutions 86 
To test further the predictibility of these same tests, bi-serial correla- 
tions were determined for all AOCS candidates examined (classes 53-58). 
(hese are given in Table 5. The correlation for the 55th and 56th classes 
71 (182 cases). If the 3 most extreme scores are excluded the correla- 
tion is raised to .83. The correlation for classes 57 and 58 is .72. 
Table 5 
Bi-Serial Correlations of Weighted Prediction Scores (Personnel, Arithmetic Compu- 
tation and Combined Reading) with Academic Success and 
Academic Failures for 53-58 AOCS Classes 


AOCS Classes Bi-Serial r 


53 & 54 (Criterion Group) 91 
55 X 56 71 
57 & 58 2 
53 to 58 inclusive 
lf preciction had been made as “pass’’ or “‘fail’’ in all of these 454 
ises, 84% of the predictions would have been correct. By giving ques- 


ble ratings to 57 candidates for whom statistical analysis indicated 


uona 
‘near 50-50 probability of success, the accuracy of prediction for the re- 
maining 397 (those for whom prediction was statistically sound) becomes 
Sv'¢. Of those given questionable ratings 60% passed and 40% failed. 

Bi-serial correlations and predictions of academic success for classes 
oY through 65 were not entirely comparable with those for earlier classes. 
Ui the later classes, all but No. 59 were composed almost entirely of 
younger ROTC candidates who were significantly higher in performance 





320 Milton B. Jensen and Julian B. Rotter 


than were AUS candidates. There was only 7% of failures among ROTC 
candidates. Consequently the overall percentage of accurate prediction 
of “‘pass’’ or “fail” was raised to 93% as against 89% for the earlier classes 
although the percent of error in predicting failures was increased. Algo. 
prediction of academic success was affected by changes in academ; 
standards. For example, the policy of failing students because of failyre 
in map reading, regardless of grades on other phases of the course, was 
introduced shortly before graduation of the 58th class. 

Intercorrelations for the various ability tests are given in the following 
table. Of definite interest is the high correlation between a test such as 
the Personnel Test measuring “general intelligence” and reading and 
arithmetic ability measured by standard achievement tests. These tests 
correlate higher with the personnel test (.58, .61, .58) than did the tw: 
general intelligence tests (.47). The highest correlation between two 
tests was the AGCT and Stanford Arithmetic Computation Test (.67 
Of additional interest was the fact that the Mechanical Comprehension 
Test correlated higher with the verbal (Personnel) general intelligence 
test and the reading comprehension test than it did with the AGCT or 
Arithmetic Computation Test. 


Table 6 
Inter-Correlations of Ability Tests, N = 180 








Mech. Paragr. Word 
Compr. Mean. Mean. 





Personnel 34 58 61 
Mechanical Comprehension 31 .28 
Paragraph Meaning 61 
Word Meaning 

Arithmetic Computation 





Value of Personality Tests in Predicting 
Success in AOCS 


Statistical evaluation of the personality tests in predicting success 
in AOCS as was made with the ability tests could not be done since the 
school records relative to resignations and transfers usually did not give 
the fundamental reasons for relief from school. Consequently, no special 
group could be isolated which could be described as failing because of 
unsuitable personality. An attempt to select “personality” failures from 
examination of the records of men who resigned although they had pas 
sing grades resulted in finding only a very few cases which could be con- 
sidered as unsuitable personality. This number was too small to be dealt 





OTC 
ction 
lasses 
Also, 
lemic 
ailure 


+» Was 


owing 
ich as 
y and 
> tests 
e two 
n two 
(.67 

ansion 
igence 


CT or 


success 
ce the 
nt give 
special 
use of 
s from 
id. pas- 
ye con- 
e dealt 


Officer Candidate Screening 321 


with statistically and the conclusion of personality inadequacy resulting 
» failure rested upon supposition. 


The value of the personality tests was partially determined from 
esting of excellent officers, previously discussed, and from individual 
examinations of men who were given these tests and later referred to the 
Military Psychologist for examination. Such clinical examination sug- 
vested that the Level of Aspiration Test described relatively accurately 
«ome fundamental characteristics of the candidates and a high malad- 
ystment score on the personal inventories reflected a real maladjustment 
in the subject. However, low inventory-scores did not assure good ad- 
istment. As the number of ‘“‘dishonest”’ cases in the test situation where 
cheating Was made easy was relatively small, opportunity for checkup on 
the value of this test was limited. 


Summary 


|. Thirteen psychological tests were evaluated in terms of their use- 
fulness in a battery of tests designed for selection and examination of 
\492 officer candidates. Additional material for determining the utility 
f the tests used came from the examination of 56 outstanding officers, 
ratings of candidates and clinical evaluations of individuals who had 
taken the battery of group tests. 

2. Several instruments were eliminated from the battery of tests dur- 
ng the 14 month period. AGCT was eliminated after it became clear 
that it was less effective for the purpose of predicting academic success 

\rmored Officer Candidate School than the shorter Personnel Test. 
Use of the Multiple Choice Rorschach was discontinued because of lack 
{ validity for selecting outstanding officer candidates. The Rhythm 
lest of the Seashore Measures of Musical Talent was discontinued when 
t failed to select individuals who were unable to keep good cadence in 


The Group Thematic Apperception Test was discontinued because of the 
excessive time required for administration and interpretation. The 
Thurstone Vocational Interest Schedule was dropped when it became clear 
that a group of outstanding officers did not present specific pattern or 
patterns of vocational interest which could be used in selecting officers. 

3. In predicting academic success the most efficient combination of 
tests included Arithmetic Computation and combined Paragraph Mean- 
ing and Word Meaning from the new Stanford Achievement Test, and the 
Wonderlic Personnel Test. These 3 tests alone when properly weighted 
showed high predictive value resulting in 89% correct prediction of aca- 
demic success or failure when questionable ratings were made for individ- 
lls at the mid-point of the predictive scale. 





322 Milton B. Jensen and Julian B. Rotter 


4. Intercorrelations of some of the ability tests for 180 cases resulted j 
some suggestive findings. The Personnel Test and the Army Gene: 1 
Classification Test, both purporting to be general measures of intelligey, : 
had a lower intercorrelation than did the Personnel Test with the achieve. 
ment tests of arithmetic and reading or did the AGCT with the Arithmet 
Computation Test. The Bennett-Fry Mechanical Comprehension Tox; 
showed highest correlations with the highly verbal Personnel Test ani 
the Reading Comprehension Test. 

5. The Group Level of Aspiration Test indicated promise for furthe; 
experimental exploration on the basis of the comparison of outstanding 
officers with unselected officer candidates and on the basis of limited 
clinical evaluations. Such clinical evaluations also suggested that ¢! 
two personality inventories were effective selective instruments wher 
very high scores (in the direction of maladjustment) were obtained. Hoy- 
ever, low inventory scores did not assure good adjustment. 


Received July 11, 1946. 


References 


. Bennett, G. K. and Fry, D. E. Test of mechanical comprehension, form AA. The 
Psychological Corporation, 1940. 
. Harrison, R. and Rotter, J. B. A note on the reliability of the Thematic apper- 
ception test. J. Abnorm. soc. Psychol., 1945, 40, 97-99. 
3. Harrower-Erickson, M. R. A multiple choice test for screening purposes. Psy- 
chosom. Med., 1943, 5, 331-341. 
. Hathaway, 8S. R. and McKinley, J. A. A multiphasic personality schedule: IV 
Psychasthenic. J. appl. Psychol., 1942, 26, 614-624. 
5. Jensen, M. B. and Rotter, J. B. The validity of the multiple choice Rorschach 
test in officer candidate selection. Psychol. Bull., 1945, 42, 181-185. 
3. Kelley, T. L., Ruch, G. M. and Terman, L. M. Stanford achievement test. World 
Book Co., 1940. 
. Murray, H. A. Thematic apperception test. Harvard University Press, 1943 
. Richardson, M. W. and Kuder, G. F. The calculation of test reliability coefficients 
based on the method of rational equivalence. J. educ. Psychol., 1939, 30, 631 
687. 
9. Rotter, J. B. Level of aspiration as a method of studying personality: IV. The 
analysis of patterns of response. J. soc. Psychol., 1945, 21, 159-177. 
. Seashore, C. E., Lewis, D. and Saetveit, J. G. Seashore measures of musical talent 
Educational Department, RCA Manufacturing Co., Inc., 1939. 
. Terman, L. M. Genetic studies of genius, Vol. I. Mental and physical traits of 
thousand gifted children. Stanford University Press, 1925, 497-499. 
2. Thurstone, L. L. Vocational interest schedule. The Psychological Corporation 
1939. 
. Wonderlic, E. F. Personnel test. E. F. Wonderlic, Chicago, IIl., 1939. 





uts of a 


oration 


The Discrepancy between Reported Schooling and Tested 
Scholastic Ability among Adolescent Delinquents 


Philip Ash 


American University 


The current marked increase in activity in the employment and 


yoeational counseling field reminds us of the importance of evaluating 
accurately information submitted by counselees in order to maintain a 


balance between the client’s aspirations and his abilities. In estimating 
the potentialities of a counselee one significant qualification factor to be 
taken into account is his reported scholastic achievement, usually ex- 
pressed in terms of school years completed. It is therefore pertinent to 
inquire how valid an index of “scholastic ability” (and to an extent of 
general intelligence) is the client’s statement of schooling completed. 

While engaged as clinical psychologist at the New York City Refor- 
matory for Boys, the author observed that a large difference usually ob- 
tained between the grades which inmates reported they had completed, 
and their scholastic ability as determined by various tests. To determine 
the reliability of this observation, and the extent of the difference, a stat- 
istical analysis of the problem was undertaken. It is the opinion of the 
author that the results noted are applicable to the vocational counseling 
and employment placement situation as well. 


Data Used 


lor a group of 85 male white adolescent delinquents, ranging in age 
from 16 to 22 years, the author compared their.reports of schooling com- 
pleted with their “‘scholastic ability’? as measured by scores on the Otis 
Classification Test, Part I. In addition, for 76 of the cases, a comparison 
was made between reported schooling completed and ‘‘verified’’ schooling 
completed, the verified school grades being based upon welfare reports. 


Reported Schooling and Tested Scholastic Achievement 


None of the 85 subjects claimed to have finished more than 12.5 years 
of schooling (first semester college); none admitted to have completed 
less than 4.5 years (Grade 5A in the New York public school system). 
Table 1 shows the distribution by reported grade and tested ability. 

On the basis of scores on the Otis Classification Test, Part I, rounded 
to nearest half-years, none of the subjects exceeded a “‘scholastic ability”’ 

323 





Philip Ash 


Table 1 
Reported and Tested Scholastic Achievement of Adolescent 
Delinquents Compared (85 Cases) 





Tested Scholastic Achievement (School Years)* 
3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 








ad 


3 
o 
& 

—_ 
pn 
i 
3 
— 

' 

> 

~ 
_ 
= 
<= 
© 

2 

od 

_— 
= 
eo 
= 
i=) 
2 
- 
Y 

= 

= 
© 
< 
= 
o) 
: 
- 
3S 
= 
° 

B= 
o 

TT 

“ 
~ 
~ 
ke 
= 
fe) 
rat 
ry 

m 








4.0 
7.0 
6.5 me¢ 
6.0 B a 
5.5 | Tl 
5.0 S a 
4.0 ms 
3.5 ex 
De 
3.0 the 
hres the 
Totals 1 2 6 Bi 
* Otis Classification Test, Part I, School Age Scores rounded to nearest half-year at 
gre 
of more than 10.0 years (second year High School in New York). Nin: 
subjects rated at less than 4.5 years in scholastic ability. th 
Statistical analysis of this scatter diagram (see Table 2) revealed ple 
highly reliable difference of 2.07 years between reported schooling com- wa 
pleted and tested scholastic ability. un 
It may be noted that the average reported scholastic achievement mu 
(8.38 years) represents approximately the full course of the New York to 
public school system; in terms of tested ability, the group displayed the 
ability of sixth-graders. 
It should also be observed that only a negligible relationship (r of .2! 
was found between reported school achievement and tested scholasti cal 
ability. Im 
These results, especially when taken in conjunction with other dats : 
Yo 


presented in succeeding sections of this paper, seem to suggest that th 





g com- 


vement 


V York 


ved the 


ay 
OI .41 


rolastit 


er data 


hat the 


Schooling and Scholastic Ability 


Table 2 


Statistical Analysis of the Discrepancy Between Tested and Reported 





Scholastic Scholastic 
Achievement Achievement Differences 





Mean (years) 8.38 yrs. 6.31 yrs. 2.07 yrs. 
¢ 1.43 yrs. 1.54 yrs. 
ie 156 168 
O difference .229 
Critical Ratio 

(Diff. /oairr.) 9.04 
cae, (corrected for r)* .204 
Critical Ratio 

(corrected for r) 10.147 





* Pearson Tr_¢ = .206. 


mechanics of the public school system tend to push students through to 
a nominal completion regardless of their demonstrated competence. 
They clearly suggest the inadequacy of reported schooling as a reliable in- 
dex of scholastic ability. 

The author was unable to find studies directly comparable to the one 
made here. However, the results of Fernald, Hayes, and Dawley! may 
be cited in partial corroboration. For a group of 437 women delinquents, 
they found the following: in reading, as measured by the Trabue Scale A, 
the group demonstrated ability at the fifth grade level; in spelling, on the 
Buckingham scale, at the sixth grade; in handwriting, on the Ayres scale, 
at the second grade; and in arithmetic, on the Courtis test, at the fourth 
grade. 

It should be noted, however, that their study was completed in 1920, 
that a sizable proportion (11.4%) of the subjects claimed to have com- 
pleted no schooling, and that the average schooling reported completed 
was only 4.88 years. The group studied in the present paper, coming 
under the jurisdiction of compulsory school laws in a metropolitan com- 
munity which energetically polices school attendance, would be expected 
to have completed more school years. 


Reported and Verified Schooling 


For 76 cases in the sample, records were available in the clinic indi- 
cating grade completed as determined by an investigator of the Social 
Investigation Unit of the Department of Correction. A comparison of 


‘Fernald, M., Hayes, M., and Dawley, A. A study of women delinquents in New 
York State. New York: The Century Company, 1920. 





326 Philip Ash 


these records with school grade completed claimed by the subjects ya: 
made to determine the extent of falsification of reported scholas 
achievement. See Table 3. 
Table 3 
Discrepancy Between Reported School Grade Completed and Verified 
School Grade Completed (76 Cases) * 
Discrepancy in School Years 
(+ indicates reported grade in excess of verified grade; — indicates reported 
grade lower than verified grade) 








No Dis- 25 +2.0 +1.0 +0.5 —1.0 
Total crepancy yrs. yrs. yrs. 





Number 76 41 8 d 9 1 ** 
Per Cent 100.0% 540% 10.5% 5.3% 14.5% 11.8% 1.3° 





* Based upon case records prepared by the NYC Department of Correctio 
Investigation Unit. 

** SIU records indicated “High School,” which was interpreted as complet 
12th grade; the inmates in these cases claimed less achievement. 


Table 4 


Discrepancy Between Reported and Verified Scholastic Achievement (76 Cases 








Reported Verified 


Achievement Achievement Differences 





Mean (years) 8.32 yrs. 7.83 yrs. 49 yrs, 
o 1.51 yrs. 1.73 yrs. 

T mean 174 .200 

Caitt 

Critical Ratio 1.9 





It may be noted that 54% of the subjects reported their scholastic 
achievement correctly, as verified by the welfare reports. Twelve or 
15.8% claimed to have completed two or more years of schooling mon 
than the records showed, while 3.9% claimed less schooling than was ‘- 
dicated in the records. This last finding would suggest that the records 
themselves were open to question. However, the difference between 
claimed and verified- schooling (Table 4) was less than half a year for tl 
group asa whole. Furthermore, this difference was of dubious statistical 
reliability. 

This finding agrees roughly with that of Fernald, Hayes, and Daw! 
(op. cit.). They noted, for women in their sample under the age of 30 
(and therefore presumably having come under compulsory school attend 
ance regulations) a difference between claimed and verified scholast 
achievement of .45 year. Since their sample was considerably larger, a0¢ 





holastic 


iz mort 
was 1n- 
records 
yet ween 
for the 


atistical 


Dawl V 
re of 39 
attend- 
‘holast ¢ 
ger, and 


Schooling and Scholastic Ability 327 


their standard errors correspondingly smaller, this difference was statis- 
tically reliable. 


Repetition of School Grades 


For a group of subjects as evidently retarded as the one under con- 
sideration, it would be expected that this retardation express itself in the 
character of their school careers. For 72 cases, information was available 
concerning the number of grades they repeated. 


Table 5 


Incidence of Grade Repetition (72 Cases) * 


More than 

Two Three Three 

No Grade Grades Grades Grades 
Repeats Repeated Repeated 


Total 
72 7 25 22 11 
100.0% 9.7% 34.7% 30.6% 15.3% 


Repeated Repeated 





* Based on SIU reports. 


In Table 5 the experience of the group in this regard is cited. Only 
97°) of the subjects repeated no school grade; 55.6% repeated two or more 
school grades. This result is somewhat in excess of that found by W.C. 


Kvaraceus,? who noted that 44% of a group of delinquents he studied 
peated two or more grades. 


Summary and Conclusions 


This study strongly suggests that schooling as indicated by reported 
school grade completed is a dubious indicator of scholastic ability, and 
should be used, if at all, with great caution in evaluating the capabilities of 

cent. 

lhe following specific points were made: 


|. For a group of 85 delinquents, a discrepancy of 2.07 years was noted 
vetween schooling reported completed (the higher) and tested scholastic 
bility (the lower). This difference was highly reliable. 

2. Only a negligible positive relationship (r = .21) was found between 
schooling completed and measured scholastic ability. 

5. An unreliable average difference of .49 year was found between 
schooling reported completed by the subjects and schooling completed as 
verified in independent welfare reports. More than half the group re- 
ported their achievement accurately. In a non-delinquent group, this 


‘Kvaraceus, W. C. Delinquency—A by-product of the school. Sch. & Soc., 1944, 
59, 350-351. 





328 Philip Ash 


proportion might even be greater. However, the desire to impress shou); 
also be taken into consideration. 

4. In this group, less than 10% of the subjects repeated no schoo) 
grade; more than half of them repeated two or more school grades. 

5. These findings are, in general, corroborated by similar studies cop. 
cerning the schooling and scholastic ability of delinquents. 

6. These findings suggest that, at least in the case of delinquents wh 
may have been mentally retarded, individuals are pushed through the 
school system faster than their achievement and abilities warrant. 


Received June 3, 1946. 





A System for Coding, Filing and Using Bibliographical 
Material for Research Purposes 


Robert P. Fischer 


University of Florida 


J. H. Rapparlie 


United States Rubber Company 


C. C. Gibbons 


Upjohn Institute for Community Research, Kalamazoo, Michigan 


The purpose of this paper is to suggest the use of the McBee Keysort 
system by research workers as a method for handling bibliographical 
material. 

Some of the criteria by which a system of bibliographical classifica- 
tion should be evaluated are simplicity, flexibility, economy, and speed 
of operation. The authors feel that the McBee Keysort system satis- 
factorily meets these criteria. The specific application of McBee Keysort 
to the field of psychology will be described as well as the mechanics of the 
Keysort system. 


Description of McBee Keysort 


The McBee Keysort system involves the use of an index or record card 


into which coded information may be entered by punching (Figure 1). 
The holes around the edges of the card are identified by numbers. When 


certain holes are notched on a card, the resulting code pattern distin- 
guishes that card from every other card not similarly punched. 

The rows of holes along the edges of the card are divided into sections, 
each of which is identified by a title printed beneath it. A section may 
consist of one or more code fields. A code field contains the numbers 
0, 1, 2,4, 7, and the letters “SF.’’ By using these numbers, any value 
from 0 to 9 can be obtained; for example, 8 would be formed by notching 7 
and 1; similarly 6 would be formed by notching 4 and 2. SF means 
“single figure’ and is notched whenever one desires to represent the num- 
bers 1, 2,4, or 7. In order to distinguish between single figures and com- 
bination numbers (3, 5, 6, 8 and 9), SF is not notched for the combina- 
tion numbers. For numbers of more than one digit, additional code fields 
are required; for example, the number 460 would require three code fields, 
one for each of the three digits. 


329 





Joa 0 9 Z9/s9g 9)9 ofs 
| | 
| 


. DECI 
| VALUE | RES. 





O99 
do 9 


T 
Ss | 
1 
AL INDEX 
BOTTOM ROW — 


crROSS INDEX 





SUBJECTS USED 











"ON X3GNi SSOUD 








-—— J 
“ON IVWID30 ASZM3G 





B3ONSYS44au SSOND 





— 


3005 i53rans 


“ON LOVULSEV 
‘ON WIV9 AuVHsIT 





3aiva 











NOlivoMmand 
uO IYNUNOr 





STILL 





PROCEDURE 


x30NI JAWSIHdveSonsia 





yYOHLNG 





ZON3NasaY SSOUD — MOY WOLLOG LOgrans YOrvW — MOU dol 


4uSLLVAW ioarens 





“Lt “HLO 


ALIMUEGVTVAY e fet Tale) 














fr) wea dOd 














9001950 


Keysort Card for psychology.* 


2 


Fic. 
* These cards may be purchased from the McBee Company, 


Athens, Ohio, for 


$25.06 per 1000. Special prices apply to larger quantities. The sorting equipment and 


hand punch cost $35.00. 





Coding, Filing and Using Bibliographical Material 331 


To locate cards containing any desired information, a series of thin 
eel rods are arranged in a holder so that when the rods are passed 
vh the holes of the cards, those cards desired will drop, while the 
ndesired cards remain on the rods (Figure 2). It is possible to locate 


| the cards one desires with a single sorting. 


|| 
le © oO © co! of 


uo 


| 


On | 


9) 
) 
OO 
9) 


Fic. 2. Sorting bibliographical cards. 


General Value for Bibliographies 


lo illustrate the use of McBee Keysort for bibliographical purposes, 
the authors explain here its application to the field of psychology. The 
pplication to psychology is described in specific terms for the sake of 
larity. Scientists in other fields can make the modifications necessary 
to adapt the technique for their own bibliographical work. 


Application of Keysort to the Field of Psychology 


Psychologists doing research work would find it convenient to have 
ther bibhographical material punched on Keysort cards so that refer- 
ences could be located easily regardless of how they are filed. 

In order to use the McBee Keysort system for classifying biblio- 

Ohio, tor 


graphical material, it is necessary to have a code for the subject matter 
ment and 





Robert P. Fischer, J. H. Rapparlie and C. C. Gibbon 


to be classified. The authors propose a code to be used for ; 
material in the field of psychology. Three code fields, which 
total of 999 categories, are used to classify the subject matt 


chology. 


Subject-Matter Code for Psychology 


100 General and Theoretical 


L110 


rg oe 
History and Elementary 
Biological and Genetical 
iixperimental and Comparative 
Social and Personality 
Statistical 
| 6 Educational and Child 
117 Clinical and Abnormal 
118 Industrial, Counseling, Personnel 
Statistics 
21 Psychophysical Methods 
22 Tests of Significance (including analysis of variance) 
23 Correlation (simple, curvilinear, multiple, ete.) 
24 Factorial Methods 
25 Item Analysis 
26 pea eng pe Aids 
27 Graphic Methods 
28 General (including experimental design and sampling 
Schools, History, Theoretical 
31 Schools (including behaviorism, gestalt, et 
Biographical 
} Organizations and Societies 
j 34 Theories of Ability 
Military 
141 Morale 
142 Selection and Classification 
143 Leadership and Discipline 
144 Propaganda 
145 Medical (including neuropsychiatrical) 
146 Training 


l 
l 
| 
| 
| 
l 
l 
| 


Learning 
151 Theories 


152 Factors Influencing Learning (i.e., knowledge of results 


153 Transfer of Training 

154 Methods of Investigation (including trial and error, « 
155 Motivation 

156 Conditioning 

157 Memory 

Adult Life and Old Age 
161 Biological 

162 Medical 

163 Edueational 

164 Psychological 

165 Sociological 


; 


t 





Coding, Filing and Using Bibliographical Material 


900 Biological (including genetics) 
210 Nervous System 
211 Nervous System 
212 Spinal Cord 
213 Autonomic Nervous System 
214 Pathological Conditions 
215 Electroencepholography 
216 Neurophysiology 
Feelings and Emotions 
221 Neurological (including autonomic) 
222 Physiological (including glandular) 
223 Psychological (reaction patterns) 
230 Theoretical 
240 Growth and Maturation 
950 Drugs and Toxins 
260 Medicine 
270 Heredity 


Educational 

310 Guidance 

320 Methods of Study 

330 Intelligence 

340 School Subjects and Methods of Teaching 
350 Progressive Education 

360 Special Abilities and Disabilities 


Abnormal and Clinical 

410 Methods of Diagnosis 

120 Psycheneuroses 

430 Psychoses 

440 Feeblemindedness 

450 Psychoanalysis 

460 Therapies 

470 Experimental 

xperimental Neuroses 
rustration—Aggression 


471 E 
472 F 


Industrial 
510 Industrial Efficiency 
511 Morale 
512 Accidents and Safety 
513 Fatigue 
514 Monotony 
515 Physical Environment (ventilation, etc.) 
520 Employment Psychology 
521 Rating Scales 
522 Interviews 
523 Criteria 
930 Advertising and Selling 
531 Methods and Techniques 
540 Legal Psychology 
541 Crime Detection 
550 Public Attitudes 
551 Propaganda 
552 Polls and Surveys 
560 Vocational Guidance 





334 Robert P. Fischer, J. H. Rapparlie and C. C. Gibbons 


600 Tests and Measurements 
610 Achievement or Subject-Matter Tests 
611 Arithmetic and Mathematics 
612 Spelling 
613 Reading 
614 Language 
615 History 
616 Industrial Arts 
617 General Achievement Tests 
620 Intelligence Tests (verbal and performance) 
630 Aptitude Tests 
631 Special Abilities (including music, art, etc.) 
632 Manual Dexterity 
633 Scholastic Aptitude 
634 Miscellaneous Tests 
640 Standardization and Interpretation 
650 General and Test Catalogs 
660 Attitude and Interest Scales 
670 Rating Scales, Inventories, and Check Lists 


Social, Personality 
710 Attitudes and Interests 
720 Character and Mores 
730 Personality 
731 Tests and Measurement 
732 Typologies 
733 Theories of Personality 
740 Racial Psychology, Anthropology, Language 
750 Religion and Ethics 
760 Sex Behavior 
770 Suicide 


None of the code fields is completely used by the standard code pro 
posed here. This fact provides considerable flexibility in the use of the 
system. Aptitude tests, for example, are assigned the number ‘‘630” with 
sub-classifications running to 634. The individual psychologist can 
assign code numbers from 635 to 639 to additional types of aptitude tests 
in which he is interested. Many benefits would result from the adoption 
by all psychologists of a standard code for classifying material in the field 
of psychology. One principal advantage would be the fact that cards 
belonging to different psychologists could be exchanged and duplicated. 
A number from this standard code might be given to each article at the 
time it is abstracted for Psychological Abstracts. 


Description of Keysort Card for Psychology 


The outline below lists the various sections of the Keysort card {0 
psychology (Figure 1) and describes the use of each section. 

One or more holes are notched in each section to represent informatio 
about the reference. 





Coding, Filing and Using Bibliographical Material 335 


I. Author. Two code fields are required to classify an author by the first 
two letters of his last name. With these two code fields, ninety-nine 
alphabetical subdivisions may be used. A complete author code is as 

] 
follows: 


Author Code* 


division Code No. Subdivision Code No. 


Haa to Hak 
gS ne: 
Har to Hd.......... e 
Hea to Hem 
Hen to Hh 
Hito Hn....... 

. seeseceee @ Ho to Ht... 

Bea to Bem. ..... 1... scene Hu to Hz.......... 

i MI ho ee Oe ee a 


Ia to Iz 


re 
Bro to Bt ‘reer eee 
Se EE ne OS ns | 

| Es ee pee 


‘aa to Car oe ee 

MENG a os We & Aika 4 evens WE oe ols 6 Sater Sakae 
oS ee § Ke to Kh castes, he 

Ci to Cn One diet a Dhoecaiet aca ater ae a ee 
a to Coq. I 6556/0 hd eh iow utes 

Cor to Cq... 8 Kr to Kz 
Narva rhe ald steric a we atipccant 


La to Ld 
i een es ere 8 ere ree 
code pro De to Dh ce eet ee Sg eee 
¢ Di to Dq 7 MOOI oot okcls ins ow ec dere 
Te} the 
om ot Se Dr to Dz 
630° with 


vist can Maa to Maq.. 
= eas ES RRC arene a............... 
tude tests "ein McA to McF............. 
. adoption McG to Md 

» field , Se  , . wa cae eecene 
— Fa to Fh 26 oS (yan ee 
hat cards Fito Fee... s,s. see ee eens OF ne cap yicwecieed 
uplicated. dyed SEER CL eee UU RS os i vvcaceccnceses 
icle at the Pr to Fa . 29 6 De ait a ie erate 


Ga to Gd 30 ee 


ue to Gk... . 31 Ni to Nz 
Gl to Gq. Ke 32 


fas Rs oS sans wing Sere xine 
rt card ft MR GritoGs.. 0222) 34 aa 


*This author code was checked against the distribution of names in the index of 
Psychological Abstracts and the A. P. A. Directory and showed a fairly uniform num- 
ber of names in each code classification. 


nformatio the 





336 Robert P. Fischer, J. H. Rapparlie and C. C. Gibbons 


Subdivision Code No. Subdivision 
EE Se 
Se Se LS 
ee NS his hae neue es 
Ph to Pz 69 EES ee renee 
Su to Sz 


DN ais denewianevsacee OO 


71 
72 


6 Serer 


Sa to Sch V T 
Scha to Schl Won to Web. he 


Schm to Scht............... 78 Welte We................. 
Schu to Sd Wh to Wilk....... 


OS SS ee re re ee Will to Ws............... 
Sm to Sn 


II. Source of Reference. 


Journals—J 
Books—BK 


Popular sources—i.e., magazines, newspapers, etc.—POP 
. Availability of Reference. 


In personal library—L 
In Institutional library (not in personal library)—I 
Not in Institutional library—X 


J. Review of Other Literature on Same Topic 


(Historical Background Presented by Author) 
Comprehensive—C 
Selected References only—RO 
Scattered references throughout article—SC 
No references—N R 


. Subject Matter. 


Three code fields are used to classify the subject matter of psychology. 
A subject-matter code has been given above. 


The lower row of holes in the Subject Matter section may be used for 
cross reference purposes so that each article or book can be assigned au 
additional code classification if the article or book contains important 
material on two different fields. 


This cross-reference feature is one of the principal advantages of the 
system described in this article. 





Coding, Filing and Using Bibliographical Material 


VI. Procedure Used. 

Apparatus—AP 

Paper-pencil techniques (Tests, rating scales, questionnaires, inventories 
checklists )—PP 

Clinical reports (Interviews, performance tests, case histories, therapies, 
diagnoses) —CL 

Statistical—ST 

Discussion (Nonexperimental, theoretical)—DIS 

Other—X 


’ 


; Subjects Used. 


Animals—AN 

Infants—I 

Pre-school—PS 

School (including high school)—S 
College- 


Adults—AD 


. Evaluation of Reference (VALUE). 
Valuable—V 
General reference value—GR 
Only for exhaustive bibliography—B 


.. Research Possibilities Suggested by Reference (RES). 
At the time he reads an article or book, the psychologist may wish to 
indicate whether or not the reference suggests to him problems for future 
research. 
Suggests problems for further research—S 
No further research suggested by article—NO 


.. Dewey Decimal Indez. 


In order to classify material of a general informational nature outside 
of the field of psychology, the Dewey Decimal Index for classifying 
bibliographical material is included on the Keysort card. By using 
3 code fields, 999 classifications are possible. The Dewey Decimal Code 
is shown below. 


A second row of holes in the section on the Dewey Decimal Index is 
provided for cross reference purposes. Each article or book can be 
assigned an additional Dewey Decimal classification if it contains im- 
portant material on two different fields. 


The Dewey Decimal Classification 


000 General Works 100 Philosophy 

-™ 010 sibliography 110 Metaphysics 

ology. 020 Library Economy _ 120 Special Metaphysical Topics 
030 General Encyclopedias 130 Mind and Body 


ad f 040 General C Ussays : ; ; 
ed for zs ) Gener al Collected Essays 140 Philosophical Systems 
ed an 50 General Periodicals 


ortant 060 General Societies 150 Psychology : 
070 Journalism— Newspapers 160 Logic Dialectics 
080 Polygraphy—Special Libra- 170 Ethics ; 

ries 180 Ancient Philosophers 
090 Book Rarities 190 Modern Philosophers 


of the 





338 Robert P. Fischer, J. H. Rapparlie and C. C. Gibbons 


200 Religion 

210 Natural Theology 

220 Bible 

230 Doctrinal Dogmatics 

240 Devotional—Practical 

250 Homiletic—Pastoral— 
Parochial 

260 Church—Institutions— 
Work 

270 General History of Church 

280 Christian Churches and Sects 

290 Nonchristian Religions 


Social Sciences 

310 Statistics 

320 Political Science 

330 Economics 

340 Law 

350 Administration 

360 Associations—Institutions 
370 Education 

380 Commerce—Communication 
390 Customs—Popular Life 


Philology 

410 Comparative 
420 English 

430 German 

440 French 

450 Italian 

460 Spanish 

470 Latin 

480 Greek 

490 Other Languages 


Pure Science 

510 Mathematics 

520 Astronomy 

530 Physics 

540 Chemistry 

550 Geology 

560 Paleontology 

570 Biology—Ethnology 
580 Botany 

590 Zoology 


600 Useful Arts—Applied Sciene 


700 


800 


610 
620 
630 
640 
650 
660 
670 
680 
690 


Medicine 

Engineering 

Agriculture 

Home Economics 
Communications—Business 
Chemical Technology 
Manufactures 
Mechanical Trades 
Building 


Fine Arts 


710 


720 


Landscape Gardening 
Architecture 


730 Sculpture 


740 
750 
760 
770 
780 
790 


Drawing— Decoration 
Painting 

Engraving 
Photography 

Music 


Amusements 


Literature 


810 
820 


American 
English 


830 German 


840 
850 


French 
Italian 


860 Spanish 


870 
880 
890 


Latin 

Greek 

Literature of Minor Lan- 
guages 


History 


910 
920 
930 
940 
950 
960 
970 
980 
990 


Geography and Travel 
Biography 

Ancient History 
Europe 

Asia 

Africa 

North America 

South America 
Oceania—Polar Regions 


The three unused sections of the card may be used by each psycholo- 
gist for classifying material of special interest to him. 

In addition to the identifications and descriptions that are punched 
into the card, certain bibliographical information must be written on the 


face of the card to identify each reference. 
name of author, title of article, source of article, date, volume number, 
pages and library call number or abstract number. Space is also provided 
for recording subject-matter code, Dewey Decimal code and the corre- 


sponding cross reference. 


This information includes 





Coding, Filing and Using Bibliographical Material 339 


A brief abstract of the article, with pertinent notes, should be written 
on the card. In some cases this abstract will make it unnecessary to re- 


for to the article itself. 

After the cards have been prepared they can be placed in a regular 
ard file. It is not necessary to file the cards in any particular order, for 
the filing order in no way influences the efficiency or speed of sorting. 


Summary 


This paper suggests the use of the McBee Keysort system as a method 
for handling bibliographical material. Application to the field of psy- 
chology is made for the purpose of illustration. A Keysort card is shown 
for the use of psychologists in recording pertinent information regarding 
each reference. The codes necessary for using the system are presented 
including a code for classifying the subject-matter of psychology. 

The technique may be modified to facilitate the handling of biblio- 
graphical material in other fields. 


eived July 5, 1946. 


References 
Dewey, Melville. Decimal classification and relative index. Lake Placid, New York: 
Forest Press, 1942. 
2. Enlisted Men—Initial Classification, Army Regulations, No. 615-25, Washington, 
D. C.: War Department, July 31, 1942. 
Louttit, C. M. Library classification for psychological literature. Psychol. Record, 
1941, 4, 350-364. 
VWcBee Keysort Manual. Athens, Ohio: The McBee GCo., 1942. 
5. Personnel Classification—Operation of the Marginal Punched Card Sorting System. 
Technical Manual 12-490. Washington, D. C.: War Department, 1944. 


ons 


yeholo- 


yunched 
. on the 
neludes 
umber, 
rovided 
e corre- 





News Notes 


An important new quarterly journal in the field of opinion polling and 
attitude measurement has recently been launched. It is called Joy. 
national Journal of Opinion and Attitude Research and is edited by PD; 
Laszlo Radvanyi, Professor of Social Psychology and Public Opini i 
National University of Mexico, Mexico City, Mexico. The main purpose 
of the Journal is to provide a forum where specialists in opinion and at- 
titude research of all countries can present and discuss problems and ey- 
periences. It will give particular importance to the problems of opinion 
and attitude polls conducted on an international scale. It will emphasix 
fundamental problems such as the determinants, structure and develop- 
ment of opinions and attitudes. However, there will be no neglect of t! 
practical aspects of opinion and attitude measurement. 

The subscription price is $4.00 per year. Single copies are $1.25 
Each number will contain approximately 120 pages. Communications 
should be addressed to the editor: Dr. Laszlo Radvanyi, Donato Guerr 
1, Desp. 207. Mexico, D. F. Mexico. 





The Institute of Mental Measurements of Rutgers University, New 
Brunswick, New Jersey, is preparing a new edition of Oscar Buros’ Mental 
Measurements. Yearbook. The last edition was put out in 1940. Th 
war interrupted the work which is now being resumed. These Yearbook 
provide critical reviews of existing tests and measurements with the fo! 
lowing major objectives: a. To provide test users with carefully pr 
pared appraisals of tests for their guidance in selecting and using tests; ! 
To stimulate progress toward higher professional standards in the co! 
struction and use of tests by commending good work, by censuring poo! 
work, and by suggesting improvements; and c. To impel test authors 
and publishers to present more detailed information on the construction, 
validation, uses, and’ misuses of their tests. 





First award in the McGraw-Hill Book Company’s 1945-46 contes! 
for the three best manuscripts on nursing subjects was won by H. Phoebe 
Gordon, Assistant Professor of Nursing and Student Counselor, Kath- 


u. 


arine J. Densford, Director of the School of Nursing, and Edmund 
Williamson, Dean of Students and Professor of Psychology—all of the 
340 





News Notes 341 


University of Minnesota—as co-authors of Counseling in Schools of Nurs- 
, to be published soon by McGraw-Hill Book Company. 


A new quarterly journal entitled Human Relations began publication 
| April 1947. It is sponsored jointly by the Tavistock Institute of 
Human Relations in England and the Research Center for Group Dy- 

mics at Massachusetts Institute of Technology, Cambridge 39, Mas- 

husetts, U.S. A. with Dorwin Cartwright as Chairman of the American 

itorial Committee. The journal will be devoted to interdisciplinary 
research studies looking toward the integration of the social sciences. 
Subscription price will be $7.00 per year. 


[he Society for the Psychological Study of Social Issues, representing 
more than 600 American social scientists, has announced the Edward L. 
Bernays Atomic Energy Award of a $1,000 U. 8. Government Bond, to 

. presented to the individual or group contributing, during 1947, the 
st action-related research in the field of the social implications of atomic 
ergy. All communications concerning the Award should be addressed 
the Chairman of the Committee, Dr. David Krech, Swarthmore Col- 
ge, Swarthmore, Penna. 


contest 
Phoebe 
9 Kath- 
{ 


U 


yund 
1 of the 





Book Reviews 


Mayo, Elton. The social problems of an industrial civilization. Bosto 
Division of Research, Harvard Business School, 1945. Pp. 159 
$2.50. 

Social scientists will find this book both interesting and provocatiy; 
Much of what Mayo evidently believes to be established fact will appea) 
to the reader as theory, assumption, guess, that needs extensive inves: 
tigation. 

“Economic theory in its human aspect is woefully insufficient; indeed 
it is absurd. Humanity is not adequately described as a horde of jp- 
dividuals, each actuated by self interest, each fighting his neighbor fo 
the scarce material of survival” (p. 59). Mayo agrees that this is true 
in extreme emergencies “if no leader appears providentially to devise co- 
operative means of meeting the crisis” (p. 41). Quoting Le Play and 
Durkheim he contends that in simpler communities “the social code and 
the desires of the individual are for all practical purposes, identical. 
Every member of the group participates in social activities because it is 
his chief desire to do so” (p. 5). Modern industrial civilization has dis- 
rupted this social organization to a large extent and to the degree this 
is true man has rebelled against the system, frequently without under- 
standing the underlying cause of his grievances. The radical leader is a 
man of “social privations—a childhood devoid of normal and happy as- 
sociation in work and play with other children. This privation seemed 
to be the source of the inability to achieve ordinary human relationships, 
of the consequent conviction that the world was hostile, and of the re- 
action by attack upon the supposed enemy” (p. 26). 

Mayo summarizes his book as follows: “‘First, in industry and in other 


groups and not with a horde of individuals. Wherever . . . by reason 
of external circumstance these groups have little opportunity to form, 
the immediate symptom is labor turn-over, absenteeism, and the like 
Man’s desire to be continuously associated in work with his fellows is 4 
strong, if not the strongest, human characteristic. Any disregard of it 
by management or any ill-advised attempt to defeat this human impulse 
leads instantly to some form of defeat for management itself. (The 
second conclusion is considered later on.) . . . Third, directly one in an 
administrative position discards the absurdities of the rabble hypothes!s 
and endeavors to deal directly with the situation that reveals itself on 
careful study, the results accomplished are astonishing” (pp. 111-112). 
342 





OStOr 


. 150 


cative 
appear 
inves- 


indeed 
- of in- 
bor for 
is true 
vise CU- 
ay and 
de and 
entical. 
ise it is 
nas dis- 
ree this 
under- 
der is a 
Ppy as- 
seemed 
ynships, 
the re- 


in other 
human 
r reason 
o form, 
he like 
OWS Is a 
rd. of it 
impulse 


The 


ne in al 


pothesis 
itself on 
11-112). 


Book Reviews 343 


Mayo bases his conclusions upon four investigations which he briefly 
reviews. They concern the introduction of rest periods in a textile com- 
pany, the change in attitudes of employees at the Hawthorne Plant, the 
investigation of absenteeism in New England industrial companies in 
1942, and in an aircraft industry in Southern Californis in 1943-1944. 
Good morale did not result from introduction of rest periods or other 
changes in working conditions as such but these innovations afforded an 
opportunity for the individual employees to coalesce into a group. It 
was the establishment of social relationships among the individuals which 
was responsible for the improved morale. 

Administration of any group involves: (1) the satisfaction of material 
and economic needs and (2) the maintenance of spontaneous cooperation 
throughout the organization. Any organization must be: (1) effective 
accomplish the objective of the system) and also (2) efficient (satisfy in- 
dividual motives). At the lower levels of an organization the employees 
must learn: (1) to be good workman and (2) to get on with his fellows. In 
terms of science: (1) Technical skill manifests itself as a capacity to manip- 
ulate things in the service of human purposes and (2) Social skill shows 
itself as a capacity to receive communications from others, and to respond 
to the attitudes and ideas of others in such fashion as to promote congenial 
participation in a common task. 

Mayo believes the social sciences are woefully inadequate. Such 
sciences “encourage students to talk endlessly about alleged social prob- 
lems. They do not seem to equip students with a single social skill that 
is usable in ordinary human situations’ (p. 20). Such students “have 
developed capacity for dealing with complex logic, they have not acquired 
any skill in handling complicated facts” (p. 21). 

Mayo’s second conclusion to his book, dragged in at the last moment, 
is an attack on psychologists, separate from social scientists in general. 
“The belief that the behavior of an individual within the factory can be 
predicted before employment upon the basis of a laborious and minute ex- 
amination by tests of his technical and other capacities is mainly, if not 
wholly, mistaken. Examination of his developed social skills and his gen- 
eral adaptability might give better results” (p. 111). So far as the re- 
viewer can discover the only evidence in this book to support Mayo’s 
conclusion is the statement that there was no correlation between intel- 
ligence test scores and production records among six women employees. 

Granting Mayo’s claim that an employee must be a good workman 
and also must get along with his fellows, then if tests are to be used they 
should measure both these characteristics. That tests are largely inade- 
quate today for the second purpose does not necessarily imply that tests 
designed for the first purpose are mainly, if not wholly, worthless. 





344 Book Reviews 


In the fourth investigation reviewed by Mayo he found among 7) 
groups of workmen, nine which had no absenteeism, ten which had ; 
good record and 52 which had a poor record. He describes at some lengt} 
the characteristics of leadman Z who directed the work of one of the nin, 
excellent groups. ‘His chief activities were, first, helping individyg| 
workers; second, ‘trouble-shooting’; and third, acting as contact man fo 
the group with the ‘outside world’ (i.e., the departmental foreman, tir 
study men, inspectors).’ 

We do not learn from the fourth investigation itself or from Mayo’s 
book how there happened to be only nine good leaders among 71 all told 
If it is purely a matter of chance then nothing much can be done about it 
except to hire 71 group leaders and eventually to replace 52 to 62 of them 
only to replace most of them again when it is apparent they do not have 
the desired characteristics. The other alternatives are selection by som 
means before employment and training after employment. Although 
men like leadman Z are essential to Mayo’s conception of good morale, 
he does not tell us how to find such relatively rare men. 

Edward K. Strong, Jr. 


ne- 


Stanford University 


Lincoln, J. F., Lincoln’s incentive system. New York: McGraw-Hill, 

1946. Pp. ix and 192. $2.00. 

The President of the Lincoln Electric Company of Cleveland, Ohio, 
describes the achievements of his company and their experiences with in- 
centives. The company was started in 1895 on a capital of $250.00 of 
borrowed money. ‘Today it is the world’s largest nanufacturer of arc- 
welding machines and electrodes. A series of graphs shows decreased 
manufacturing and selling costs, increased take-home pay and stock div- 
idends for some or all of the years 1917 to 1944. For control purposes, 
data from other companies and industries are shown on some of the 
graphs, and these indicate no such favorable picture. 

The book may be divided into three sections. Chapter I deals with 
production and cost figures. Chapters II-VIII are primarily theoretical 
and interpretive, and Chapters IX—XII describe procedures in a general 
way. . 

Mr. Lincoln correctly recognizes the reason for the success of the 
Lincoln incentive system and he repeats and reemphasizes this through- 
out the book. The reason is that his incentive system is a management 
philsosphy and not merely a method of determining pay. The reviewer 
agrees with Mr. Lincoln when he writes: “. . . there is no doubt that the 
incentive management philosophy outlined herein is fundamental to mat 
whether he is playing a game, raising a garden, or living a life’ (vu). 





length 
i€ nine 
Vidual 
an fo 


, time- 


Mayo's 
ll told 
bout it 
‘them, 
it have 
Vy some 
though 


norale, 


w-Hill, 


, Ohio, 
ith in- 
).00 of 
of are- 
‘reased 
ck div- 
rposes, 


of the 


ls with 
retical 


yeneral 


of the 
\rough- 
yement 
viewer 
hat the 
to man 
’ (vill). 


Book Reviews 345 


[he continued emphasis on incentive management in this book probably 
.ontributes more to the growth and acceptance of sound human relations 
‘ny industry than do most textbooks on industrial psychology. It would 
be highly desirable for psychologists to devote more of their energies to 
studying incentives and their relation to our social problems. 


The Lincoln system of incentive management calls for: (1) setting the 
best possible job method with a proper piece work price on it, and (2) 
regarding that as a contract that cannot be changed by management 
regardless of how great earnings become. In this manner there is an in- 
centive for workers to develop their techniques and their abilities. And 
in this situation the manager will not be considered a “‘cheating bastard”’ 
y. 145). ‘Piecework is basically for the purpose of inspiring and train- 
ing the worker to do his best. It is not primarily for the purpose of get- 
ting him to do more work for less” (pp. 146-7). The resulting reduced 
labor cost per piece is but a small part of the savings. The increased 
production makes a decreased overhead which is a larger contribution. 

Mr. Lincoln is not a psychologist and one might expect errors in tech- 

‘al details. Such errors are scattered throughout. Also found in the 

ok is a surprisingly large number of correct usages of psychological 
terms and technicalities. Persons other than psychologists will disagree 
with certain points. Stockholders probably do not approve the idea of 
sharing most of the greatly increased profits among the workers, the 
management, and the customers. As the author points out, these lat- 
ter groups are directly responsible for the increased profits and therefore 
deserve the lion’s share. Mr. Lincoln decries collective bargaining as it 
urrently prevails (“civil war,” p. 94), but welcomes labor’s contributions 
made through the provided channels. At his plant there is a labor- 
management advisory committee. 

The coldly factual reader will discern a lack of scientific objectivity 
and of rigid controls throughout this book. This is not “scientific”’ 
dustrial management or psychology. Incentive management is, as Mr. 
Lincoln says, “like a religious conversion” (p. 160). But industrial 
human relations will probably never be ‘“‘scientific”’ in the same way as, 
say, comparative psychology. Lack of controls will probably always 
prevail, and will be overcome only by a huge amassing of detailed data 
and a careful comparing and analyzing of situations. This book fills a 
very useful place in the development of a “scientific industrial psychol- 
ogy” under that concept. 


in- 


Harold F. Rothe 


Stevenson, Jordan and Harrison, Inc., 
Chicago, JUinois 





346 Book Reviews 


Finch, F. H. Enrollment increases and changes in the mental level of th, 
high-school population. Stanford University, California, Stanford 
University Press, 1946. Pp. 75. $1.25 
Many persons, including psychologists, believe that the vast increase 

in high-school enrollments since 1900 has led to marked reduction in the 

mean IQ of the high-school population. Finch has investigated the mat- 
ter, with results that many persons will consider astonishing. 

The early literature contains estimates, inferences, and conclusions to 
the effect that the high-school population twenty or thirty years ago was 
restricted almost entirely to persons in the upper half of the distribution 
with respect to mentality, that is, to persons above 100 IQ. Carefu 
survey of relevant facts suggests that such conclusions had no adequate 
foundation. 

Finch’s investigation takes account of three main types of data. The 
first consists of evidence of the relationship of intelligence to persistence 
in school; such data indirectly support the inference that increase in high- 
school enrollments has led to lowering of the mental level of the students. 
The second type of data consists of survey results showing the mean IQ 
of high-school students as found in early studies and as found in recent 
investigations. Such data indicate no lowering whatever of the mean 
IQ level of the student body. 

The third and best type of data consists of repeated tests in the same 
schools, using the same tests. Finch reports results of such re-tests in 
two school systems in two different states. The original tests were made 
in the larger school in 1922, and the re-tests in 1941. In the other in- 
stance, the tests were made in 1926 and in 1942. In the larger school, 
the mean intelligence test score was reliably higher and the standard 
deviation of scores was smaller in 1942. In the smaller school, there were 
no differences. In both schools, enrollments had increased markedly, so 
that enrollments in 1941 and 1942 included much larger percentages of 
young persons of high-school age in the communities under consideration. 

Finch’s data clearly support the conclusion that increases in enroll- 
ment have not been accompanied by any lowering of the average mental 
level of students in our secondary schools. Those who find this conclusion 
hard to accept will éertainly want to read the monograph in order to 
find out in detail how carefully the work has been done. In explaining 
the findings, Finch favors the idea that the results are largely due to 
more effective elementary school education and more favorable out- 
of-school environment in recent years. A disturbing alternative comes 
to mind—perhaps Finch has only shown that the schools twenty to thirty 
years ago were far less select than was formerly believed. Finch con- 
cludes that it is now feasible to extend free secondary school education 





of the 


inford 


crease 
in the 
> mat- 


Ons to 
rO Was 
ution 
areful 


quate 


The 
stence 
. high- 
dents. 
an IQ 
recent 

mean 


> same 
ests In 
made 
ner in- 
school, 
indard 
e were 
dly, 80 
izes of 
ration. 
enroll- 
mental 
slusion 
der to 
laining 
due to 
e out- 
comes 
thirty 
h con- 
cation 


Book Reviews 


to all but the mentally defective. For the future, he recommends sweep- 
ing changes, in the direction of greater variety of curricula, and marked 
increase in the amount of personal and individual guidance offered to 
students. 

Finch’s report deals with a careful and extensive investigation of 
an important problem. His study is a significant contribution, having 
far-reaching implications, both for psychological theory and for educa- 
tional practice. His monograph will be read with interest by psycholo- 
gists, and persons interested in educational theory. It should be read 
by all persons interested in the secondary school system of the United 
States. Either Finch’s conclusions should be accepted, and his recom- 
mendations followed, or support should be found for alternative con- 
clusions. Further research on the problem should be started at once. 
Finch’s findings are too significant to be neglected. 

Harold D. Carter 

University of California 


White, J. Gustav. Changing your work. New York: Association Press, 

1946. Pp. 210. 

This book was written to help meet the needs of the large numbers of 
former service men and women, war industry workers, and others who 
are currently changing their jobs and occupations. It is directed to the 
person making vocational adjustments rather than to professional coun- 
selors, but has some value to the latter as an illustration of the approach 
used by one of considerable experience. 

For Dr. White is a dean of vocational counselors in the original sense 
of the term. He therefore draws on an experience far richer than most 
workers in the field, who tend to be much younger or to cease actual 
counseling and take up administrative, instructional, or research duties 
as they get older. As he demonstrates in Chapters 2, 12, 13, and 14,he 
combines with this practical experience and knowledge of cases an in- 
timate knowledge of the economic and psychological factors affecting 
vocational adjustments. 

In passing, it may be worth commenting on the tendency of some con- 
selors to emphasize the number of cases they have counseled. One pro- 
lifie writer on guidance claims, in the preface to his book on counseling, 
to have counseled 10,000 students in 12 years, or approximately 3 per 
working day. White’s record is 25,000 cases in 25 years, or about 3 per 
day. If time for re-interviews, recording, case conferences, correspond- 
ence, ete. is also included, this leaves little time for the extensive and ex- 
cellent administrative work, teaching and writing in which these coun- 
selors have engaged. One is inclined, therefore, to wonder about the 





Book Reviews 


duration, frequency, and intimacy of the contacts with the persons coup. 
seled. 

The defects of this book are those commonly found in popular 
entations. Illustrative cases (there are a great many) are often over. 
simplified. In emphasizing the practical and concrete as _ interes 
arousers, the principles illustrated by a series of cases are not pointed ; 
clearly enough or discussed at sufficient length, thereby trusting too my 
to the reader’s ability to generalize. Some cases seem to have : 
at all (e.g., paragraph 2, page 116). A list of tests is given on | 
following a statement which implies that the client will use it to ch 
the tests he will take, perhaps being “‘guided by”’ the counselor's “) 


erences.”” The sub-head ‘Personality Adjustment” in this list is follow: 
by a parenthetic ‘‘consult a catalogue of psychological tests for additior 
suggestions.”” The reviewer suspects that the publisher should be crit- 
icized for this, since publishers often request the inclusion of such lists 
in popular books; but authors who yield to such pressure, and who yield 
so injudiciously, also deserve censure. 

The inspirational tone will annoy some readers, especially the prof 
sionally trained. But the psychologist with some perspective will p 
ably recognize that, in a world in which much counseling needs to tak 
people pretty much as they are and help them to become somewhat mor 


effective rather than perfectly adjusted, the inspirational approach 
often not without value. The role of the trained counselor is, despit 
the list of tests, well emphasized. Allin all, this is a book which counselors 
should be familiar with and which they will find useful with some clients 
Donald E. Supe 


Department of Guidance, 
Teachers College, Columbia University 


Selling, Lowell S. and Ferraro, Mary Anne 8S. The psychology of diet and 
nutrition. New York: W. W. Norton and Co., Inc., 1945. Pp. 192 
$2.75. 

There is no area of physiological activity, except the sexual sphere, 
as rich in psychological components as nutrition. There are the psy- 
chophysiological problems of taste, of general hunger and its relation t 
gastric contractions and body chemistry, and the question of specii' 
hungers. Emotional states, in their turn, influence stomach motility 
and blood supply to the stomach wall as well as secretion of hydroch! 
acid and of saliva. Psychological aspects of fitness—sensory, motor, 
intellective, and emotional—have been included in studies on human 
nutritional requirements and effects of nutritional deficiencies. Ap 





Book Reviews 349 


tite, susceptible in a large measure to cultural influences, and food 
hits are topics of clinical and social psychology. 


— The field of psychodietetics is ripe for a critical and systematic review 


\ cee of experimental research and clinical experience. The present book, 
ied oriented toward practical applications, takes up only some facets of this 
roblem complex. The question of how people may be induced to eat 
physiologically needed food is its central topic. The volume is intended 
r those who can practically utilize the knowledge of psychology as re- 
lated to nutrition and diet: the physician who prescribes diets, the clinical 
tian, the supervisor of an industrial feeding program, the psychia- 
trist concerned both with the effects of malnutrition upon mental health 
nd the effects of psychological maladjustment upon eating, and—last 
t not least-—the housewife. 
After a review of the basic psychological aspects of nutrition the au- 
thors discuss food habits and fads, aversions and taboos. A large chapter 
devoted to children’s feeding problems at different age levels and to 
ecial nutritional behavior problems. Both rejection of food, having 
; climax in anorexia nervosa, and overeating which leads to obesity may 
a result of psychological dynamics, a reaction to feelings of insecurity 
nd frustration, an attempt to escape from a tense emotional situation. 
In industrial feeding the management encounters psychological prob- 
lems in attempting to develop in the worker favorable attitudes for 
tching from haphazard box lunches to well balanced meals served in 
the plant. When prescribing therapeutic diets for ambulant patients 
the physician must consider the personality of the individual in order 
to mobilize all the resources which will contribute to the patient’s ad- 
ence to the dietary regimen. In some cases of obesity, it may be 


‘oach iS 
des} ite 
] 
unseiors 
clients 
uper 


ulded parenthetically, psychotherapy must parallel the dietotherapy 
f satisfactory improvement is to be achieved and maintained. There is 
chapter dealing with the effects of deficient diets on behavior and per- 
nality. The book concludes with a discussion of the education of food 
consumers in correct ways of feeding. 

The volume is the result of a fruitful collaboration between a psy- 
| sphere, chiatrist and a dietitian. The rules of a good psychodietetic practice, 
the psy- appended to a number of chapters, are sound and may serve as a useful 
lation to check list. However, the authors have not always escaped the temp- 

specifi tation of overdramatizing the importance of food: “If the practitioner (to 
motility whom children are brought because they are not doing well in school) 
rochloric is well versed in dietetics and mental disturbances which may result from 
, motor, mproper feeding, he will often—italics ours—recognize that the child’s 
. human .kwardness is the result of a vitamin deficiency, of a mineral deficiency, 


or some other similar defect”’ (p. 16). 





350 Book Reviews 


It is unfortunate that the authors did not cite the literature to wi 
they refer in the text. 


The material is scattered through a wide rar 
of publication media and the psychologist interested in the probler 


nutrition as well as the student of dietetics or public health who ; 
have used the book as a text would welcome the needed referencs S 
original sources. 

Josef Brozek 
Laboratory of Physiological Hygiene, 


University of Minnesota 





New Books, Monographs, and Pamphlets 


phs, and pamphlets for listing and possible review should be sent t 


h 
Donald G. Paterson, Editor, Department of Psychology, University 
of Minnesota, Minneapolis 14, Minnesota 


psychology. Swami Akhilananda. New York: Harper and 
Brothers, 1947. Pp. 241. $2.50. 
stment to physical handicap and illness: a survey of the social psy- 
chology of physique and disability. Roger G. Barker, Beatrice A. 
Wright, and Mollie R. Gonick. New York: Social Science Research 
Council, 1946. Pp. 372. $2.00. 
tional opportunities for veterans. Francis J. Brown. New York: 
Public Affairs Press, 1946. Pp. 142. $2.00. 
ence and freedom. Lyman Bryson. New York: Columbia Univer- 
sity Press, 1947. Pp. 191. $2.75. 
- stake in collective bargaining. Thomas R. Carskadon and §S. T. 
Williamson. New York: Public Affairs Committee, 1946. Pp. 32. 
$.10. 
Description and measurement of personality. R. B. Cattell. Yonkers: 
World Book Co., 1946. Pp. 602. $4.00. 
application of the Rorschach test to young children. Mary Ford. 
Minneapolis: The University of Minnesota Press, 1946. Pp. 114. 
$2.00. 
ersonal counsel. A supplement to morals. Robert Frank. New York: 
Informative Books, 1946. Pp. 306. $3.50. 
w and conscience: the irrepressible conflict. Buell G. Gallagher. New 
York: Harper and Brothers, 1946. Pp. 244. $2.50. 
latistics in psychology and education. Third Edition. Henry M. Gar- 
New York: Longmans, Green and Co., 1947. Pp. 477. $4.00 
Human welfare and industrial efficiency: an introduction to industrial psy- 
chology. L. S$. Hearnshaw and R. Winterbourn. Wellington, C. L., 
N. Z.: A. H. and A. W. Reed, 1945. Pp. 169. 5s. 
Human factors in management. Edited by Schuyler Dean Hoslett. 
Parkville, Mo.: Park College Press, 1946. Pp. 322. $4.00. 
Measurement in psychology. Thelma Hunt. New York: Prentice-Hall, 
Inc., 1947. Pp. 471. $3.50. 
How to use handicapped workers. Arthur T. Jacobs. Deep River, Conn.: 
National Foremen’s Institute, Inc., 1947. Pp. 186. $3.50. 
351 





352 New Books, Monographs, and Pamphlets 


Should the government support science? Waldemar Kaempffert. \ 
York: Public Affairs Committee, 1946. Pp. 32. $.10. 

The Navaho. Clyde Kluckhohn and Dorothea Leighton. Cambride, 
Harvard University Press, 1946. Pp. 258. $4.50. ° 

The constitution and civil rights. Milton R. Konvitz. New York: | 
umbia University Press, 1947. Pp. 254. $3.00. 

Integrating high school and college. The six-four-four plan at 
Leonard V. Koos. New York: Harper and Brothers, 1946. Pp, 293 
$3.00. 

The people look at radio. Paul F. Lazarsfield and Harry Field. ()}, 
Hill, N. C.: University of North Carolina Press, 1946. Pp.13 
$2.50. 

Psychology of personality. R. Leeper. Ann Arbor: Edwards, 1946 
167. $2.00. 

Guide to jobs: where and how to get them. Maxwell Lehman and Leor 
Theil. Reader Service, 1946. Pp. 40. 

The rediscovery of morals. Henry C. Link. New York: E. P. Dutta 
and Co., Inc., 1947. Pp. 223. $2.50. 

Accident prevention administration. F. G. Lippert. New York: Me- 
Graw-Hill Book Co., Inc., 1946. Pp.275. $2.25. 

Job evaluation methods. Charles W. Lytle. New York: The Ron 
Press Co., 1946. Pp. 329. $6.00. 

Personnel administration in libraries. Lowell Martin, Editor. Chicag 
The University of Chicago Press, 1946. Pp. 160. $3.00. 

Intellectual status at maturity as a criterion for selecting items in prescl 
tests. Katharine M. Maurer. Minneapolis: The University of Min- 
nesota Press, 1946. Pp. 166. $2.50. 

Mass persuasion. Robert K. Merton, Marjorie Fiske and Alberta 
Curtis. New York: Harper and Brothers, 1947. Pp. 210. $2.50 

Career opportunities. M. Morris, Editor. Washington, D. C.: Progress 
Press, 1946. Pp. 354. $3.25. 

Rebuilding the sales staff. Saul Poliak. New York: McGraw-Hill Book 
Co., Inc., 1946. Pp. 503. $4.00. 

Sexual inadequacy of the male. Paul Popenoe. Los Angeles: The Amer- 
ican Institute of Family Relations, 1946. Pp. 41. $1.00. 

The relation of parental authority to children’s behavior and attituacs 
Marian J. Radke. Minneapolis: The University of Minnesota Press 
1946. Pp. 123. $2.00. 

Survey of personnel testing practices. Forrest V. Routt, Jr. San Fran- 
cisco: California Council of Personnel Management, 1946. Pp. + 
$1.00. 





Progress 
Hill Book 
he Amer- 


attitudes 
ota Press, 


san Fran- 


New Books, Monographs, and Pamphlets 300 


Developing your executive ability. Howard Smith. New York: McGraw- 
Hill Book Co., Inec., 1946. Pp. 225. $2.50. 

Personnel research and test development in the bureau of naval personnel. 
Dewey B. Stuitt. Princeton: Princeton University Press, 1947. Pp. 
500. $7.50. 

Deliquent girls in court. Paul W. Tappan. New York: Columbia Uni- 
versity Press, 1947. Pp. 246. $3.00. 

The job that fits you—and how to get it. John and Enid Wells. New 
York: Prentice-Hall, Inc., 1946. Pp. 423. $3.75. 

Industry and society. W. F. Whyte. New York: McGraw-Hill Book 
Co., Ine., 1946. Pp. 211. $2.50. 

Psychiatric interviews with children. Helen Leland Witmer, Editor. 
New York: The Commonwealth Fund, i946. Pp. 451. $4.50. 


Jobs and the man. Luther E. Woodward and Thomas A. C. Rennie. 


Deep River, Conn.: National Foremen’s Institute, Inc., 1946. Pp. 
145. $2.00. 

Selection and placement of new employees. Bulletin No. 9. Melbourne, 
\ustralia: Industrial Welfare Division, Department of Labour and 
National Service, 1946. Pp. 38. 

Placement and probation in the public service. Committees on Placement 
and Probation in the Public Service. Chicago: Civil Service As- 
sembly, 1946. Pp. 201. $3.50. 

‘uidance programs for schools of nursing. Committee on Vocational 
Guidance. New York: National League of Nursing Education, 1946. 
Pp. 114. $2.00. 

Industrial films: a source of occupational information. Department of 
Labor. Washington, D. C.: Superintendent of Documents, U. S. 
Government Printing Office, 1946. Pp. 72. $.20. 

How to establish and maintain a personnel department. Third Edition. 
Research Report No. 4. New York: American Management Associa- 
tion, 1946. Pp. 116. $2.25. 

Manual of employment interviewing. Research Report No. 9. New 
York: American Management Association, 1946. Pp. 75. $1.50. 

Selected reading list on industrial relations for supervisors. Pasadena: 
Industrial Relations Section, California Institute of Technology, 1946. 
Pp. 8. $.30. 


How to prepare and publish an employee manual. Third Edition. New 
York: American Management Association, 1947. Pp. 35. $.75. 











